I have successfully migrated from BB to hobbit (excellent software IMHO), however there is a small thing that I cannot get solved.
I collect rrd data for the tests vmstat, vmstat1, vmstat2,vmstat3,vmstat4 and vmstat5
If I build an URL like: http://MYBBDISPLAY/hobbit-cgi/hobbitgraph.sh?host=AHOBBITCLIENT&service=vmst...
I can see the graph just fine (so the data on the rrd files and the graph definitions - which come standard on hobbitgraph.cfg - are OK), however I cannot get hobbit to display the graph and link in the trends page. It only displays vmstat , but not vmstat1-vmstat5
According to the documentation I should just add the graph names to the GRAPHS variable at hobbitserver.cfg, but I have already done that: GRAPHS="la,disk,inode,qtree,memory,users,vmstat,vmstat3,vmstat4,vmstat5,iostat,tcp.http,tcp,netstat,temperature,ntpstat,apache,bind,sendmail,mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy,hobbitd,postfix,mrtg::1"
The funny thing is that I also made a custom graph 'postfix' for postfix queues and that one shows up on the trends page perfectly, it is only the vmstat family which refuses to appear.
My version is hobbit-4.1.2p1-1 on Fedora Core 3.
Thanks in advance for your help.
My hobbit server has been error free since I installed 4.1.2 but in the last day of so, has had an error for hobbitd_rrd .
The rrd-data.log shows: *** glibc detected *** double linked list Worker process died with exit code 134 *** glibc detected *** double free or corruption (fasttop)
I can't find anything in the archives about this.
No changes made to the system, no client added. I;ves stopped and started hobbit with no changed.
The rrd graphs show gaps every 10 - 20 minutes or so.
Any suggestions? more data required?
Regards Geoff
-------------------------------Safe Stamp----------------------------------- The sender's Anti-virus Service scanned this email. It is safe from known viruses.
On Mon, Mar 06, 2006 at 03:55:13PM +1100, Geoff Steer wrote:
My hobbit server has been error free since I installed 4.1.2 but in the last day of so, has had an error for hobbitd_rrd .
The rrd-data.log shows: *** glibc detected *** double linked list Worker process died with exit code 134 *** glibc detected *** double free or corruption (fasttop)
This usually indicates some sort of corruption of the memory allocation inside hobbitd_rrd. Since hobbitd_rrd depends on the rrdtool library, it could also be a problem with that.
Since it's glibc you're probably on a Linux/Intel platform. Would it be possible for you to run the hobbitd_rrd command through the "Valgrind" memory checker ? I don't know if Valgrind is included with your distribution - it is part of the standard Debian release, but your distro might be different. If you can get it installed, then just change the command in the "[rrddata]" section from
CMD hobbitd_channel --channel=data --log=$BBSERVERLOGS/rrd-data.log
hobbitd_rrd --rrddir=$BBVAR/rrd
to
CMD hobbitd_channel --channel=data --log=$BBSERVERLOGS/rrd-data.log
valgrind --log-file=$BBSERVERLOGS/valgrind.log
hobbitd_rrd --rrddir=$BBVAR/rrd
Let it run until the errors shows up, then send me the valgrind.log.* files.
Regards, Henrik
I've finally gotten back to looking at this problem and have some more info that may be relevant. It hasn't been high on the list as hobbit is still working fine for alerts.
Firstly, I've tried removing the existing rrd files and letting hobbit create new ones, no change - the core files still are produced.
I've tried building hobbit 4.1.2p1 with rrdtool 1.2.11 and with 1.2.12, no change. This is with existing rrd files and also letting hobbit create new ones as required.
In looking at the current core files with gdb, it seems that that they all report an error related to sendmail:
(gdb) bt #0 0x00abe7a2 in ?? () from /lib/ld-linux.so.2 #1 0x00afe7d5 in raise () from /lib/tls/libc.so.6 #2 0x00b00149 in abort () from /lib/tls/libc.so.6 #3 0x08054af2 in sigsegv_handler (signum=11) at sig.c:57 #4 0x00afe8c8 in killpg () from /lib/tls/libc.so.6 #5 0x0804e011 in do_sendmail_rrd ( hostname=0xb7f6f037 "outrelay1.firstwave.com.au", testname=0xb7f6f052 "sendmail", msg=0xbffc5dd0 tstamp=1143771322) at rrd/do_sendmail.c:127 #6 0x08050120 in update_rrd ( hostname=0xb7f6f037 "outrelay1.firstwave.com.au", testname=0xb7f6f052 "sendmail", msg=0xb7f6f05b "data outrelay1,firstwave,com,au.sendmail Fri Mar 31 13:15:22 EST 2006\nStatistics from Tue Jun 21 10:47:07 2005\n M msgsfr bytes_from msgsto bytes_to msgsrej msgsdis msgsqur Mailer\n 3 25299848"..., tstamp=1143771322, sender=0x0, ldef=0x0) at do_rrd.c:271 #7 0x08049e3a in main (argc=0, argv=0xbffca4e4) at hobbitd_rrd.c:199
I'm ready to rebuild the server entirely but I'm not convinced that this will resolve the issue. As I said previously, this set up has been working fine for months, the problem started for no obvious reason in early march.
Regards geoff
On Mon, 2006-03-06 at 10:58 +0100, Henrik Stoerner wrote:
On Mon, Mar 06, 2006 at 03:55:13PM +1100, Geoff Steer wrote:
My hobbit server has been error free since I installed 4.1.2 but in the last day of so, has had an error for hobbitd_rrd .
The rrd-data.log shows: *** glibc detected *** double linked list Worker process died with exit code 134 *** glibc detected *** double free or corruption (fasttop)
This usually indicates some sort of corruption of the memory allocation inside hobbitd_rrd. Since hobbitd_rrd depends on the rrdtool library, it could also be a problem with that.
Since it's glibc you're probably on a Linux/Intel platform. Would it be possible for you to run the hobbitd_rrd command through the "Valgrind" memory checker ? I don't know if Valgrind is included with your distribution - it is part of the standard Debian release, but your distro might be different. If you can get it installed, then just change the command in the "[rrddata]" section from
CMD hobbitd_channel --channel=data --log=$BBSERVERLOGS/rrd-data.log
hobbitd_rrd --rrddir=$BBVAR/rrdto
CMD hobbitd_channel --channel=data --log=$BBSERVERLOGS/rrd-data.log
valgrind --log-file=$BBSERVERLOGS/valgrind.log
hobbitd_rrd --rrddir=$BBVAR/rrdLet it run until the errors shows up, then send me the valgrind.log.* files.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-------------------------------Safe Stamp----------------------------------- Your Anti-virus Service scanned this email. It is safe from known viruses. For more information regarding this service, please contact your service provider.
-------------------------------Safe Stamp----------------------------------- The sender's Anti-virus Service scanned this email. It is safe from known viruses.
On Fri, Mar 03, 2006 at 10:52:48PM +0100, Eduardo Mayoral wrote:
however I cannot get hobbit to display the graph and link in the trends page. It only displays vmstat , but not vmstat1-vmstat5
According to the documentation I should just add the graph names to the GRAPHS variable at hobbitserver.cfg, but I have already done that: GRAPHS="la,disk,inode,qtree,memory,users,vmstat,vmstat3,vmstat4,vmstat5,iostat,tcp.http,tcp,netstat,temperature,ntpstat,apache,bind,sendmail,mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy,hobbitd,postfix,mrtg::1"
I'm afraid you've misunderstood the docs (which means I'll need to make them more explicit). The GRAPHS setting is "only" used to find out which RRD databases should be used on the trends page. But vmstat has multiple datasets inside a single RRD database, and you want to show several of the datasets on your trends page.
The answer to that one is to add a TRENDS setting to the host entry in the bb-hosts file. Like
10.0.0.1 myhost.foo.com # TRENDS:*,vmstat:vmstat|vmstat3|vmstat4|vmstat5
Regards, Henrik
Hi Henrik,
Doing content checks on "large" web pages (13M) disturbs hobbitd; in the log : "Data flooding from 10.33.254.87, closing connection" causing a bunch of network checks to go purple..
That url did 13M because of a big tomcat dump... and we (sysadmin) don't controls the size of the webpages...
Do you have a work arround for this ?
Regards,
Olivier
On Thu, Mar 09, 2006 at 02:49:55PM +0100, Olivier Beau wrote:
Hi Henrik,
Doing content checks on "large" web pages (13M) disturbs hobbitd; in the log : "Data flooding from 10.33.254.87, closing connection" causing a bunch of network checks to go purple..
This is really a safety/security thing to avoid hobbitd consuming all of memory. Since hobbitd keeps everything in memory, it would be too easy to launch a denial-of-service attack by just flooding it with data.
That url did 13M because of a big tomcat dump... and we (sysadmin) don't controls the size of the webpages...
I hope your developers weren't forced to explain every bit of that dump :-)
Do you have a work arround for this ?
Try the attached patch for the network test tool. It limits the amount of content data that is sent across to 1 MB, but the content check itself is performed on the full amount of data.
Untested, but fairly simple so I would expect it to work.
Regards, Henrik
Hi Henrik,
In bb-hosts, settings an apache tag will an invalid url (apache=www.toto.com/server-status?auto) causes the whole bbtest-net to fails and coredump
(gdb) bt #0 0x0026deff in raise () from /lib/tls/libc.so.6 #1 0x0026f705 in abort () from /lib/tls/libc.so.6 #2 0x08059cf1 in xstrdup (s=0x0) at memory.c:175 #3 0x08053105 in add_http_test (t=0x9195e10) at httptest.c:418 #4 0x0804f5c9 in main (argc=9, argv=0xbfffa264) at bbtest-net.c:2227
Regards,
Olivier Beau
participants (4)
-
emayoral@arsys.es
-
gsteer@firstwave.com.au
-
henrik@hswn.dk
-
olivier@qalpit.com