On Friday, 21 August 2009 00:42:59 David Baldwin wrote:
j.sansford at ntlworld.com wrote:
Hi Buchan,
We get a core dump, running a pstack gives the following info:
core 'core' of 11142: hobbitd_rrd --rrddir=/export/home/hobbit/data/rrd fed28a17 _lwp_kill (1, 6) + 7 fecd1d63 raise (6) + 1f fecb1bad abort (806fe88, fecd55f6, 8768eb0, 806a6ca, fed901c0, 0) + cd 08060291 xstrdup (0, 806a6ca, 87d9d1c, 8081cc0, 84ed451, 0) + 31 0805bf7c do_netapp_extratest_rrd (84ec4ff, 806af10, 84ec8fa, 4a8b1bbf, 8081a00, 8081cc0) + 200 0805c1c9 do_netapp_extrastats_rrd (84ec4ff, 84ec509, 84ec511, 4a8b1bbf, 84ec4f4, 4a8b1bbf) + e1 0805e0ea update_rrd (84ec4ff, 84ec509, 84ec511, 4a8b1bbf, 84ec4f4, 0) + 7d6 08054044 main
(2, 804613c, 8046148) + 4dc 080539fc _start (2, 8046484, 8046490, 0, 80464b6, 80464f6) + 80
OK, so it crashed in do_netapp_extratest_rrd from hobbitd/rrd/do_netapp.c . I'm not familiar with pstack, but it looks like this may be from a stripped binary (or, you may be able to get more information from pstack).
If pstack can't show the values, then you may want to consider running hobbitd_rrd with the --debug flag, which should result in some logging of what it has received just before it crashes.
That looks like you are running extratest for a netapp which from what I can see in hobbitd/do_rrd.c is what handles the xtstats column reported by netapp.pl - just from a cursory glance at the code - I don't use it myself. You really need to look at the C code to check it's doing the right thing. You have 2 choices - quick fix is to disable just that test in netapp.pl - other option is to work out what format it should be and fix the test.
In 4.2.3 for example, the do_devmon.c RRD code doesn't actually implement what is documented
What is not implemented?
Where do you see this documented?
There is one fix that I have committed in svn (Xymon 4.2 branch, Xymon 4.3 branch, devmon svn). I am not aware of any other requests or bugs filed on the devmon rrd collector.
and I use a perl script with --extra-script instead
Is this the one shipped with devmon, or would you like to contribute a better one?
Various RRD handlers are in hobbitd/rrd/do_*.c Looking at the code for xstrdup in lib/memory.c as below you should check your logs - it's probably getting called with a NULL pointer (unlikely you're out of memory), but the logs should tell you.
char *xstrdup(const char *s) { char *result;
if (s == NULL) { errprintf("xstrdup: Cannot dup NULL string\n"); abort(); } result = strdup(s); if (result == NULL) { errprintf("xstrdup: Out of memory\n"); abort(); }#ifdef MEMORY_DEBUG add_to_memlist(result, strlen(result)+1); #endif
return result;}
xstrdup is called twice in do_netapp_extratest_rrd, but seeing the string that it's aborting on would help narrow it down. If you can provide the status message that made hobbitd_rrd crash (retrieve it using: bb localhost 'hobbitdlog hostname.testname') it can be used to reproduce this by someone trying to fix the bug.
Note that as of 5.30pm today the logs for rrd-status.log is 127MB full of errors, which span over 607625 lines (this is just for today, we roll the logs each night). This seems abnormally large to me and I think eventually this is crashing the server.
It is still unlikely that this has anything to do with hobbitd_rrd crashing.
Regards, Buchan