Hi all
And even with --no-cache, I am still getting these corrupted rrd files. I tried again with --debug (and --no-cache) and it core dumps.
Here's the backtrace.
adb /zones/smcconsole/root/opt/local/xymon/server/bin/xymond_rrd
./core_smcconsole_xymond_rrd_61_61_1426033626_20899
core file = ./core_smcconsole_xymond_rrd_61_61_1426033626_20899 -- program
``
/zones/smcconsole/root/opt/local/xymon/server/bin/xymond_rrd'' on platform
SUNW,SPARC-Enterprise-T2000
SIGABRT: Abort
$c
libc.so.1_lwp_kill+8(6, 0, 0, 3a9a0, ffffffff, 6) libc.so.1abort+0x110(0, 1, feebafec, ffb3c, fef35518, 0)
sigsegv_handler+0x30(b, 0, ffbfa160, 0, fe8a2a00, ffbfa160)
libc.so.1__sighndlr+0xc(b, 0, ffbfa160, 3a970, 0, 1) libc.so.1call_user_handler+0x3b8(b, 0, 0, 0, fe8a2a00, ffbfa160)
libc.so.1sigacthandler+0x60(b, 0, ffbfa160, ffbfb290, 0, fee91924) libc.so.1strlen+0x50(514cf, ffbfb3ec, ffbfa9d1, 0, 0, 0)
libc.so.1`vfprintf+0xec(6c3d0, 514c0, ffbfb3e8, 0, a0ba4, 33e1c)
dbgprintf+0xa4(514c0, 0, 51400, 6c3f0, bf, 2ab388)
dump_tcp_services+0x74(a0, 1c00, fef37940, 0, 51400, 51400)
init_tcp_services+0x91c(6a400, 51400, 51400, 6c3f0, bf, 23)
rrd_setup+0x15c(93314, ffbfcfc4, 63c00, ffbfcf50, 6a400, 6a400)
find_xymon_rrd+4(93314, 511e8, 54ff8bda, 0, 932f3, 2e)
main+0x948(93314, ffbfcfc4, 63c00, ffbfcf50, 54ff8bda, 93374)
_start+0x5c(0, 0, 0, 0, 0, 0)
Anything else I can offer that will assist?
Regards Vernon
On 5 March 2015 at 09:11, J.C. Cleaver <cleaver at terabithia.org> wrote:
On Wed, March 4, 2015 2:52 pm, Jeremy Laidman wrote:
On 04/03/2015 6:02 PM, "Vernon Everett" <everett.vernon at gmail.com> wrote:
Looks like we might need to check with JC for more on that GOCLIENT thing. I just find it odd that it happened about the same time as the corruption. I haven't seen it again today, and haven't seen any other corruption either.
If there's a correlation it might help us work out where the fault is. But it might be only a symptom.
As for the --debug option, it caused xymond_rrd to crash and burn, dumping cores as we go.
Could be that thensame bug causing the crash during debug is also causing the corrupt filename. Have you analyzed the core dumps?
GOCLIENT is indeed the means by which xymond_channel listeners communicate with xymond for the picking up of messages over SysV IPC. I believe the messages there are just a side effect of it re-launching the channel listener pipe to xymond_rrd.
The cache routines in xymond_rrd should be stable at this point. Can you send a backtrace in from one of the cores? I'm curious where things could be acting up here.
Regards,
-jc
-- "Accept the challenges so that you can feel the exhilaration of victory"
- General George Patton