hobbitd crashing (both 4.0.2 and latest snapshot) on Solaris 8
After adding additional tests today I saw that some of my tests weren't getting through. I checked their logs and saw repeted lines like: 2007-04-10 20:17:59 Could not connect to bbd at xxx.xxx.xxx.xxx:1985 - Connection refused 2007-04-10 20:17:59 Whoops ! bb failed to send message - Connection failed (replaced actual IP with xxx.xxx.xxx.xxx)
Additionally, both in my previous install of hobbit and the current snapshot I'm seeing the following messages written to /var/log/hobbit/hobbitlaunch.log: 2007-04-10 18:13:16 Setting up network listener on 0.0.0.0:1985 2007-04-10 18:13:16 Setting up local listener 2007-04-10 18:13:17 Task bbdisplay terminated by signal 15 2007-04-10 18:13:17 Setting up signal handlers 2007-04-10 18:13:17 Setting up hobbitd channels 2007-04-10 18:13:17 Setting up logfiles 2007-04-10 18:16:21 Task hobbitd terminated by signal 6 2007-04-10 18:16:21 Task bbdisplay terminated by signal 15
I will install gdb and see if I can get more info to provide but am wondering if this is a known issue. I need help as I need to complete a bb4->hobbit migration by next Friday, have a meeting scheduled tomorrow to show my progress, and hobbit is "dead" at the moment.
Help? stephen
On Tue, Apr 10, 2007 at 06:28:43PM -0700, Menton, Stephen wrote:
2007-04-10 18:13:17 Setting up logfiles 2007-04-10 18:16:21 Task hobbitd terminated by signal 6
This should leave a core dump in ~hobbit/server/tmp/ . Please run this through gdb, see http://www.hswn.dk/hobbit/help/known-issues.html#bugreport
If possible, send me (directly, off-list) a copy of your bb-hosts file, the hobbitserver.cfg file, and the ~hobbit/server/tmp/hobbitd.chk file. Note that these have lots of information about the hosts you're monitoring, so if that is considered confidential make sure you're allowed to send it to me.
Regards, Henrik
On Tue, Apr 10, 2007 at 06:28:43PM -0700, Menton, Stephen wrote:
Additionally, both in my previous install of hobbit and the current snapshot I'm seeing the following messages written to /var/log/hobbit/hobbitlaunch.log: 2007-04-10 18:13:16 Setting up network listener on 0.0.0.0:1985 2007-04-10 18:13:16 Setting up local listener 2007-04-10 18:13:17 Task bbdisplay terminated by signal 15 2007-04-10 18:13:17 Setting up signal handlers 2007-04-10 18:13:17 Setting up hobbitd channels 2007-04-10 18:13:17 Setting up logfiles 2007-04-10 18:16:21 Task hobbitd terminated by signal 6
Stephen and I worked on this yesterday, and the problem turned out to be a general one, which his particular set of tests just happened to trigger easily. Specifically, he was sending a fairly large status message from a script in one single line, i.e. $BB $BBDISP "status myhost.customtest green .... <several KB data>" and this would cause hobbitd to crash the next time it should update the webpages.
You wouldn't have to have very long status messages to trigger this; having a lot of smaller ones is enough. So this could be the reason for some of the unexplained hobbitd crashes reported over time.
The attached patch solves this. I'll be updating the "allinone" patch later today with this.
Regards, Henrik
participants (2)
-
henrik@hswn.dk
-
stephen.menton@cingular.com