hobbitd coredumping and purple trends
My hobbitd is core dumping every so often and less often but still occasional the trends column turns purple.
Looking through the makefile the only oddity is MAXMSG=32768 Were my old BBd was set to #define MAXLINE 11264
I have core files in /tmp from hobbitd
Logs :
more bb-display.log 2005-04-01 15:47:59 Whoops ! bb failed to send message - timeout 2005-04-01 16:02:59 Whoops ! bb failed to send message - timeout 2005-04-01 16:03:00 connect to bbd failed - Connection refused 2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection failed 2005-04-01 16:03:00 connect to bbd failed - Connection refused 2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection failed 2005-04-01 16:03:00 connect to bbd failed - Connection refused 2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection failed 2005-04-01 16:18:05 Whoops ! bb failed to send message - timeout 2005-04-01 17:03:08 Whoops ! bb failed to send message - timeout
more hobbitd.log 2005-04-01 15:32:47 Setup complete 2005-04-01 15:32:54 Setup complete 2005-04-01 15:48:01 Setup complete 2005-04-01 16:03:01 Setup complete 2005-04-01 16:33:03 Setup complete 2005-04-01 16:48:04 Setup complete
I have a lot of these errors in larrd-data.log from various hosts. 2005-04-01 17:17:53 RRD error updating /local/packages/IT/HOBBIT/hobbit/data/rrd/ray1.tigr.org/netstat.rrd from 172.17.10.20: expected 12 data source readings (got 16) from 1112393873:597496849:203665680:0:1400608:474490:380897:4323:190:65584910 3:2750185864:9271815:54370878:358842800:919424657:55608:57615:... 2005-04-01 17:18:15 RRD error updating /local/packages/IT/HOBBIT/hobbit/data/rrd/akela.tigr.org/netstat.rrd from 172.17.10.87: expected 12 data source readings (got 16) from 1112393894:7278664:4601574:0:2187293:80558:15408:1028:18:3786687185:3319 9304:551592:3055134:392628802:534540232:12324:8938:... 2005-04-01 17:18:22 RRD error updating /local/packages/IT/HOBBIT/hobbit/data/rrd/vader.tigr.org/netstat.rrd from 172.16.4.50: expected 12 data source readings (got 16) from 1112393902:844147:844153:0:173177:11681993:15774:1756237:109:2946405093: 1171800154:1508:44541250:1263968085:53592252:29:1305303:... 2005-04-01 17:18:49 RRD error updating /local/packages/IT/HOBBIT/hobbit/data/rrd/invino.tigr.org/netstat.rrd from 172.17.10.29: expected 12 data source readings (got 16) from 1112393929:161474660:161355279:0:979032:1013326:8108:2751:26:3077107260: 3115145104:3779497608:1171327:3474031250:2366740414:176290878:15382:...
I used the moverrd.sh .
And these errors from lard-status.log: 005-04-01 17:18:10 RRD error updating /local/packages/IT/HOBBIT/hobbit/data/rrd/IGR51RRTB.tigr.org/temperature .module_6_asic-.rrd from 172.17.10.16: illegal attempt to update using time 1112393889 when last update time is 1112393889 (minimum one second step) 2005-04-01 17:20:04 RRD error updating /local/packages/IT/HOBBIT/hobbit/data/rrd/utah.tigr.org/disk.rrd from 172.17.10.79: illegal attempt to update using time 1112394004 when last update time is 1112394004 (minimum one second step) 2005-04-01 17:20:04 RRD error updating /local/packages/IT/HOBBIT/hobbit/data/rrd/utah.tigr.org/disk.rrd from 172.17.10.79: illegal attempt to update using time 1112394004 when last update time is 1112394004 (minimum one second step) 2005-04-01 17:21:27 RRD error updating /local/packages/IT/HOBBIT/hobbit/data/rrd/atlas.tigr.org/netstat.rrd from 172.17.10.80: expected 11 data source readings (got 16) from 1112394087:23501770:2904610:0:97558:26724:76:17:8:U:U:U:U:226801128:2976 62863:U:956:...
any suggestions? Thanks
i'm still running RC6, and i have the same behaviour : serveral cores in tmp/ (about a dozen per day) they seem to be bbtest-net, but also bbgen cores !
i have also seem my hobbitd bark to listen to port 1984... (telnet localhost 1984 would not answer; couple seconds after it would...)
henrik : can these 2 problems be related ?
olivier
Selon "Deal, Richard" <rdeal at tigr.org>:
My hobbitd is core dumping every so often and less often but still occasional the trends column turns purple.
Looking through the makefile the only oddity is MAXMSG=32768 Were my old BBd was set to #define MAXLINE 11264
I have core files in /tmp from hobbitd
Logs :
more bb-display.log 2005-04-01 15:47:59 Whoops ! bb failed to send message - timeout 2005-04-01 16:02:59 Whoops ! bb failed to send message - timeout 2005-04-01 16:03:00 connect to bbd failed - Connection refused 2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection failed 2005-04-01 16:03:00 connect to bbd failed - Connection refused 2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection failed 2005-04-01 16:03:00 connect to bbd failed - Connection refused 2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection failed 2005-04-01 16:18:05 Whoops ! bb failed to send message - timeout 2005-04-01 17:03:08 Whoops ! bb failed to send message - timeout
On Sat, Apr 02, 2005 at 01:13:00AM +0200, olivier at qalpit.com wrote:
i'm still running RC6, and i have the same behaviour : serveral cores in tmp/ (about a dozen per day) they seem to be bbtest-net, but also bbgen cores !
I'd like to see call-traces from those core files:
cd ~hobbit/server gdb bin/bbgen tmp/core-from-bbgen [messages from gdb starting up] gdb> bt
and send me the output.
i have also seem my hobbitd bark to listen to port 1984... (telnet localhost 1984 would not answer; couple seconds after it would...)
henrik : can these 2 problems be related ?
Perhaps ... but I wouldn't expect them to be, unless it was hobbitd that crashed.
Henrik
On Fri, Apr 01, 2005 at 05:22:42PM -0500, Deal, Richard wrote:
My hobbitd is core dumping every so often and less often but still occasional the trends column turns purple.
hobbitd crashing - that's bad.
Could you run the core-dump through gdb and send me the call-trace. Do this:
$ gdb ~hobbit/server/bin/hobbitd /tmp/core-file-from-hobbitd
[messages from gdb]
gdb> bt
and send me the output from that "bt" command.
Looking through the makefile the only oddity is MAXMSG=32768 Were my old BBd was set to #define MAXLINE 11264
Shouldn't cause any problems, it just means Hobbit will accept larger messages than your BB setup.
more bb-display.log 2005-04-01 15:47:59 Whoops ! bb failed to send message - timeout 2005-04-01 16:02:59 Whoops ! bb failed to send message - timeout 2005-04-01 16:03:00 connect to bbd failed - Connection refused
Probably a result of hobbitd being down.
I have a lot of these errors in larrd-data.log from various hosts. 2005-04-01 17:17:53 RRD error updating /local/packages/IT/HOBBIT/hobbit/data/rrd/ray1.tigr.org/netstat.rrd from 172.17.10.20: expected 12 data source readings (got 16) from
The "netstat" and "vmstat" RRD files from LARRD are not compatible with Hobbit. Do a
find ~hobbit/data/rrd -name netstat.rrd | xargs rm -f
to delete the old files.
005-04-01 17:18:10 RRD error updating /local/packages/IT/HOBBIT/hobbit/data/rrd/IGR51RRTB.tigr.org/temperature .module_6_asic-.rrd from 172.17.10.16: illegal attempt to update using time 1112393889 when last update time is 1112393889 (minimum one second step)
This is a bit more tricky. It means that the same RRD file was being updated by two status messages within one second - that normally should not happen, because a status is sent every 5 minutes. It can happen if you have two hosts reporting the same hostname (one of them would be the 172.17.10.16 IP you have in that error message).
Regards, Henrik
participants (3)
-
henrik@hswn.dk
-
olivier@qalpit.com
-
rdeal@tigr.org