On 11 March 2015 at 11:37, Vernon Everett <everett.vernon at gmail.com> wrote:
And even with --no-cache, I am still getting these corrupted rrd files.
:-(
I tried again with --debug (and --no-cache) and it core dumps.
Here's the backtrace.
libc.so.1`vfprintf+0xec(6c3d0, 514c0, ffbfb3e8, 0, a0ba4, 33e1c) dbgprintf+0xa4(514c0, 0, 51400, 6c3f0, bf, 2ab388) dump_tcp_services+0x74(a0, 1c00, fef37940, 0, 51400, 51400)
So dump_tcp_services() calls dbgprintf() (both on lib/netservices.c) which in turn calls vprintf() from libc, but with bad parameters. I've had a look through the code in dump_tcp_services() and I don't know enough C to recognize any problems. But it might be useful to know which call to dbgprintf() is causing the problem.
Does the log file for xymond_rrd show any debug output at all? If so, what's the last line that is shown.
It might be helpful if you can recompile xymond_rrd with dump_tcp_services() modified. Initially, I would simply try it with "return" added after the first call to dbgprintf(). That is, dump_tcp_services() will output "Service list dump" and return. This might stop the core dumps so that we can get debug output for other parts of the xymond_rrd processing.
If adding "return" at that point fixes this core dump, more diagnostic lines would be useful to determine what the problem is. For example, there's a global array called svcinfo that is iterated over, but if the array is empty, it might cause the core dump. So adding a line that checks whether the array is empty and displays the result would help to pin this down.
Note that "svcinfo" appears to be populated from the protocols.cfg file and/or XYMONNETSVCS. Is it possible that your protocols.cfg file is empty, or has some syntax error that causes it to be unparseable? The same for XYMONNETSVCS (in xymonserver.cfg)?
J