On 31-01-2013 15:51, nloyau.ext at orange.com wrote:
I discovered a bug in xymond_rrd module present in release4.3.7 and after. Some times function update_rrd in do_rrd.c msg pointer (*msg) can be null. That cause an segfault at strchr call.
Following the coredump analyse
#0 0xb775e424 in __kernel_vsyscall () #1 0xb741e781 in raise () from /lib/i686/cmov/libc.so.6 #2 0xb7421bb2 in abort () from /lib/i686/cmov/libc.so.6 #3 0x08070723 in sigsegv_handler (signum=11) at sig.c:57 #4 <signal handler called> #5 0xb7466a83 in strchr () from /lib/i686/cmov/libc.so.6 #6 0x08050b80 in do_ncv_rrd (hostname=0xb6c847af "lvmpitg-sql02o.lvm93.cvf", testname=0xb6c847c8 "mbs_oracle1", classname=0x807fc0d "", pagepaths=0x807fc0d "", msg=0x0, tstamp=1359477166) at rrd/do_ncv.c:54 #7 0x0805a45f in update_rrd (hostname=0xb6c847af "lvmpitg-sql02o.lvm93.cvf", testname=0xb6c847c8 "mbs_oracle1", msg=0x0, tstamp=1359477166, sender=0xb6c847a1 "10.193.26.14", ldef=0x9f5d608, classname=0x807fc0d "", pagepaths=0x807fc0d "") at do_rrd.c:712 #8 0x0804a90f in main (argc=3, argv=0xbfadf744) at xymond_rrd.c:369
I can see how this can happen, and it should not crash on that. But the underlying problem is that your host lvmpitg-sql02o.lvm93.cvf sends a status- or data-message with no data in it - not even a new-line after the first line.
A better patch for this is attached, it handles all cases where there is a null message, not just the NCV case.
Regards, Henrik