On Fri, Apr 01, 2005 at 05:22:42PM -0500, Deal, Richard wrote:
My hobbitd is core dumping every so often and less often but still occasional the trends column turns purple.
hobbitd crashing - that's bad.
Could you run the core-dump through gdb and send me the call-trace. Do this:
$ gdb ~hobbit/server/bin/hobbitd /tmp/core-file-from-hobbitd
[messages from gdb]
gdb> bt
and send me the output from that "bt" command.
Looking through the makefile the only oddity is MAXMSG=32768 Were my old BBd was set to #define MAXLINE 11264
Shouldn't cause any problems, it just means Hobbit will accept larger messages than your BB setup.
more bb-display.log 2005-04-01 15:47:59 Whoops ! bb failed to send message - timeout 2005-04-01 16:02:59 Whoops ! bb failed to send message - timeout 2005-04-01 16:03:00 connect to bbd failed - Connection refused
Probably a result of hobbitd being down.
I have a lot of these errors in larrd-data.log from various hosts. 2005-04-01 17:17:53 RRD error updating /local/packages/IT/HOBBIT/hobbit/data/rrd/ray1.tigr.org/netstat.rrd from 172.17.10.20: expected 12 data source readings (got 16) from
The "netstat" and "vmstat" RRD files from LARRD are not compatible with Hobbit. Do a
find ~hobbit/data/rrd -name netstat.rrd | xargs rm -f
to delete the old files.
005-04-01 17:18:10 RRD error updating /local/packages/IT/HOBBIT/hobbit/data/rrd/IGR51RRTB.tigr.org/temperature .module_6_asic-.rrd from 172.17.10.16: illegal attempt to update using time 1112393889 when last update time is 1112393889 (minimum one second step)
This is a bit more tricky. It means that the same RRD file was being updated by two status messages within one second - that normally should not happen, because a status is sent every 5 minutes. It can happen if you have two hosts reporting the same hostname (one of them would be the 172.17.10.16 IP you have in that error message).
Regards, Henrik