I have seen this as well. I finally determined it was caused by ATM interfaces. Devmon does not give different components of an ATM circuit (the physical interface, the -atm layer, .0 sub interface, -aal5 layer) unique names. So rrd was receiving data for 5 interfaces all with the same name. As a temporary interface, I stopped monitoring the atm interfaces, but this is a bug.
Interface names: ATM5/0/0 ATM5/0/0-atm layer ATM5/0/0.0-atm subif ATM5/0/0-aal5 layer ATM5/0/0.0-aal5 layer
Devmon sees these all as: ATM5/0/0 because devmon templates (atleast for 6509's) are looking at ifName as the main identifier, which is not always unique. Not sure on a solution yet. MRTG uses ifIndex as it's unique key.
Robert
On Fri, Oct 31, 2008 at 3:15 AM, Buchan Milne <bgmilne at staff.telkomsa.net>wrote:
On Friday 31 October 2008 05:51:42 Everett, Vernon wrote:
Hi all
Devmon was causing the hobbitd_rrd module to crash and burn. Now this could be a bug, but it could also be a PEBKAC. I am hoping somebody can assist either way.
I added a Cisco 2851 to Hobbit, using devmon. Now here is the possible PEBKAC Since Devmon doesn't have templates for the 2851, I used the template for the Cisco 2811. (Network guru told me they are pretty much the same, except for a few extra bells and whistles on the 2851.)
The data for the device started appearing in Hobbit, and all looked good. Devmon even created the rrd files for the new Cisco device.
However, the hobbitd_rrd module started core dumping, and the Hobbit server page started displaying red for hobbitd_rrd with the crash detected message. See core data below. Took the new Cisco device out of Hobbit, and cores stopped, and life was good again.
Is there a significant enough difference between the 2851 and the 2811 to cause this, or are we looking at a genuine bug?
Real bug. I see it on the temperature tests on a new IOS.
I am leaning towards a bug, because even if the collected data was complete rubbish, should it cause the module to core?
Regards Vernon
My Linux guy reckons this is the important stuff from the core. uname -a Linux las006 2.6.18-92.1.1.el5 #1 SMP Thu May 22 09:01:47 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux cat /etc/redhat-release Red Hat Enterprise Linux Client release 5.2 (Tikanga)
gdb -c core.8550 /usr/lib/hobbit/server/bin/hobbitd_rrd GNU gdb Red Hat Linux (6.5-37.el5_2.1rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/libthread_db.so.1".
Reading symbols from /usr/lib64/librrd.so.2...done. Loaded symbols for /usr/lib64/librrd.so.2 Reading symbols from /usr/lib64/libpng12.so.0...done. Loaded symbols for /usr/lib64/libpng12.so.0 Reading symbols from /lib64/libpcre.so.0...done. Loaded symbols for /lib64/libpcre.so.0 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /usr/lib64/libfreetype.so.6...done. Loaded symbols for /usr/lib64/libfreetype.so.6 Reading symbols from /usr/lib64/libz.so.1...done. Loaded symbols for /usr/lib64/libz.so.1 Reading symbols from /usr/lib64/libart_lgpl_2.so.2...done. Loaded symbols for /usr/lib64/libart_lgpl_2.so.2 Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Core was generated by `hobbitd_rrd --rrddir=/var/lib/hobbit/rrd --debug'. Program terminated with signal 6, Aborted. #0 0x0000003db7a30155 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x0000003db7a30155 in raise () from /lib64/libc.so.6 #1 0x0000003db7a31bf0 in abort () from /lib64/libc.so.6 #2 0x00000000004119f3 in sigsegv_handler (signum=<value optimized out>) at sig.c:57 #3 <signal handler called> #4 0x0000003db7a77ac0 in strcat () from /lib64/libc.so.6 #5 0x000000000040462a in do_devmon_rrd (hostname=0x2ada311e2806 "PERIR205", testname=0x2ada311e280f "if_load", msg=<value optimized out>, tstamp=<value optimized out>) at rrd/do_devmon.c:87 #6 0x000000000040b656 in update_rrd (hostname=0x2ada311e2806 "PERIR205", testname=0x2ada311e280f "if_load", msg=0x2ada311e2842 "status PERIR205.if_load green Fri Oct 31 10:31:39 2008", tstamp=1225416699, sender=<value optimized out>, ldef=0xfeffffffffffff00) at do_rrd.c:372 #7 0x000000000040261d in main (argc=<value optimized out>, argv=0x7fff7a088318) at hobbitd_rrd.c:153 (gdb)
Could you show the Devmon RRD section of the message for the if_load test on the PERIR205 host? I can confirm the cause, and maybe offer a workaround.
I am actually (constantly) reproducing the issue on my workstation against the new IOS that can trigger this, I have a workaround in place in production, and was hoping to get around to fixing this next week.
Regards, Buchan
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk