I am getting a sporadic problem with the latest release (RC4). The bb-hostsvc.cgi starts returns the message "Status not available" for all services. After a restart of the hobbitd daemon, everything is back to normal. There doesn't seem to be a pattern on this one yet. Looking at every hour or so. Note that I am running on Solaris 2.8.
- Brian
Looks like this problem is also resolved by waiting a few minutes.
On Mon, 28 Feb 2005 10:59:51 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
I am getting a sporadic problem with the latest release (RC4). The bb-hostsvc.cgi starts returns the message "Status not available" for all services. After a restart of the hobbitd daemon, everything is back to normal. There doesn't seem to be a pattern on this one yet. Looking at every hour or so. Note that I am running on Solaris 2.8.
- Brian
More information.. It looks like the hobbitd daemon is dying sporadically with the following entries repeating each time in hobbitlaunch.log:
2005-02-28 06:45:21 Task bbhistory started with PID 22393 2005-02-28 06:45:21 Task bbenadis started with PID 22395 2005-02-28 06:45:21 Task bbpage started with PID 22397 2005-02-28 06:45:21 Task larrdstatus started with PID 22399 2005-02-28 06:45:21 Task larrddata started with PID 22401 2005-02-28 06:45:26 Task bbnet started with PID 22428 2005-02-28 06:45:32 Task bbdisplay started with PID 22455 2005-02-28 06:45:32 Task bbretest started with PID 22456 2005-02-28 06:45:32 Task bbdisplay-noc started with PID 22457 2005-02-28 06:46:35 Task bbdisplay started with PID 22761 2005-02-28 06:46:35 Task bbretest started with PID 22762 2005-02-28 06:46:35 Task bbdisplay-noc started with PID 22767 2005-02-28 06:47:35 Task bbdisplay started with PID 23057 2005-02-28 06:47:35 Task bbretest started with PID 23060 2005-02-28 06:47:36 Task bbdisplay-noc started with PID 23067 2005-02-28 06:48:35 Task bbdisplay started with PID 23349 2005-02-28 06:48:35 Task bbretest started with PID 23350 2005-02-28 06:48:40 Task bbdisplay-noc started with PID 23380 2005-02-28 06:49:35 Task bbdisplay started with PID 23644 2005-02-28 06:49:35 Task bbretest started with PID 23645 2005-02-28 06:49:40 Task bbdisplay-noc started with PID 23674 2005-02-28 06:50:16 Task hobbitd terminated by signal 6 2005-02-28 06:50:16 Loading hostnames 2005-02-28 06:50:16 Loading saved state 2005-02-28 06:50:16 Task hobbitd started with PID 23850 2005-02-28 06:50:16 Too few fields in record - found 1, expected 17 2005-02-28 06:50:16 Too few fields in record - found 1, expected 17 2005-02-28 06:50:16 Too few fields in record - found 1, expected 17 2005-02-28 06:50:16 Too few fields in record - found 1, expected 17 2005-02-28 06:50:16 Too few fields in record - found 1, expected 17 2005-02-28 06:50:16 Setting up network listener on 0.0.0.0:1984 2005-02-28 06:50:16 Setting up signal handlers 2005-02-28 06:50:16 Setting up hobbitd channels 2005-02-28 06:50:16 Setting up logfiles
On Mon, 28 Feb 2005 11:18:39 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Looks like this problem is also resolved by waiting a few minutes.
On Mon, 28 Feb 2005 10:59:51 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
I am getting a sporadic problem with the latest release (RC4). The bb-hostsvc.cgi starts returns the message "Status not available" for all services. After a restart of the hobbitd daemon, everything is back to normal. There doesn't seem to be a pattern on this one yet. Looking at every hour or so. Note that I am running on Solaris 2.8.
- Brian
It is also dumping core... here is the gdb results from the core file. Going to try turning debugging on next:
bash-2.03$ gdb ../bin/hobbitd core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.8"... Core was generated by `hobbitd --restart=/opt/hobbit/server/tmp/hobbitd.chk --checkpoint-file=/opt/hob'. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1 #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 (gdb) backtrace #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x0001cf54 in sigsegv_handler (signum=-14958584) at sig.c:57 #3 <signal handler called>
On Mon, 28 Feb 2005 13:56:55 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
More information.. It looks like the hobbitd daemon is dying sporadically with the following entries repeating each time in hobbitlaunch.log:
2005-02-28 06:45:21 Task bbhistory started with PID 22393 2005-02-28 06:45:21 Task bbenadis started with PID 22395 2005-02-28 06:45:21 Task bbpage started with PID 22397 2005-02-28 06:45:21 Task larrdstatus started with PID 22399 2005-02-28 06:45:21 Task larrddata started with PID 22401 2005-02-28 06:45:26 Task bbnet started with PID 22428 2005-02-28 06:45:32 Task bbdisplay started with PID 22455 2005-02-28 06:45:32 Task bbretest started with PID 22456 2005-02-28 06:45:32 Task bbdisplay-noc started with PID 22457 2005-02-28 06:46:35 Task bbdisplay started with PID 22761 2005-02-28 06:46:35 Task bbretest started with PID 22762 2005-02-28 06:46:35 Task bbdisplay-noc started with PID 22767 2005-02-28 06:47:35 Task bbdisplay started with PID 23057 2005-02-28 06:47:35 Task bbretest started with PID 23060 2005-02-28 06:47:36 Task bbdisplay-noc started with PID 23067 2005-02-28 06:48:35 Task bbdisplay started with PID 23349 2005-02-28 06:48:35 Task bbretest started with PID 23350 2005-02-28 06:48:40 Task bbdisplay-noc started with PID 23380 2005-02-28 06:49:35 Task bbdisplay started with PID 23644 2005-02-28 06:49:35 Task bbretest started with PID 23645 2005-02-28 06:49:40 Task bbdisplay-noc started with PID 23674 2005-02-28 06:50:16 Task hobbitd terminated by signal 6 2005-02-28 06:50:16 Loading hostnames 2005-02-28 06:50:16 Loading saved state 2005-02-28 06:50:16 Task hobbitd started with PID 23850 2005-02-28 06:50:16 Too few fields in record - found 1, expected 17 2005-02-28 06:50:16 Too few fields in record - found 1, expected 17 2005-02-28 06:50:16 Too few fields in record - found 1, expected 17 2005-02-28 06:50:16 Too few fields in record - found 1, expected 17 2005-02-28 06:50:16 Too few fields in record - found 1, expected 17 2005-02-28 06:50:16 Setting up network listener on 0.0.0.0:1984 2005-02-28 06:50:16 Setting up signal handlers 2005-02-28 06:50:16 Setting up hobbitd channels 2005-02-28 06:50:16 Setting up logfiles
On Mon, 28 Feb 2005 11:18:39 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Looks like this problem is also resolved by waiting a few minutes.
On Mon, 28 Feb 2005 10:59:51 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
I am getting a sporadic problem with the latest release (RC4). The bb-hostsvc.cgi starts returns the message "Status not available" for all services. After a restart of the hobbitd daemon, everything is back to normal. There doesn't seem to be a pattern on this one yet. Looking at every hour or so. Note that I am running on Solaris 2.8.
- Brian
On Mon, Feb 28, 2005 at 02:00:34PM -0800, Brian Lynch wrote:
It is also dumping core... here is the gdb results from the core file. [snip] (gdb) backtrace #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x0001cf54 in sigsegv_handler (signum=-14958584) at sig.c:57 #3 <signal handler called>
No more lines than that ? I'd expect a few more unless there's some serious memory corruption taking place.
Henrik
Have it running with '--debug' now... Hopefully, this will yield more information.
- Brian
On Mon, 28 Feb 2005 23:10:14 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Mon, Feb 28, 2005 at 02:00:34PM -0800, Brian Lynch wrote:
It is also dumping core... here is the gdb results from the core file. [snip] (gdb) backtrace #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x0001cf54 in sigsegv_handler (signum=-14958584) at sig.c:57 #3 <signal handler called>
No more lines than that ? I'd expect a few more unless there's some serious memory corruption taking place.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Enabled the -DDEBUG on compile and enabled --debug on runtime for hobbitd. Same output from gdb..
#0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 (gdb) backtrace #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x000281f4 in sigsegv_handler (signum=0) at sig.c:57 #3 <signal handler called>
Here are the sections of each log:
hobbitlaunch.log
2005-02-28 22:24:56 Task bbhistory started with PID 2567 2005-02-28 22:24:56 Task bbenadis started with PID 2572 2005-02-28 22:24:56 Task bbpage started with PID 2575 2005-02-28 22:24:56 Task larrdstatus started with PID 2578 2005-02-28 22:24:56 Task larrddata started with PID 2580 2005-02-28 22:24:56 Task bbdisplay started with PID 2582 2005-02-28 22:24:56 Task bbnet started with PID 2584 2005-02-28 22:24:56 Task bbretest started with PID 2585 2005-02-28 22:24:58 Task larrdcolumn started with PID 2608 2005-02-28 22:24:58 Task infocolumn started with PID 2609 2005-02-28 22:25:16 Task hobbitd terminated by signal 6 2005-02-28 22:25:17 Loading hostnames 2005-02-28 22:25:17 Task hobbitd started with PID 2712 2005-02-28 22:25:17 Loading saved state 2005-02-28 22:25:17 Setting up network listener on 0.0.0.0:1984 2005-02-28 22:25:17 Setting up signal handlers 2005-02-28 22:25:17 Setting up hobbitd channels 2005-02-28 22:25:17 Setting up status channel (id=1) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',1) 2005-02-28 22:25:17 ftok() returns: 0x10078B4 2005-02-28 22:25:17 shmget() returns: 0x8FC 2005-02-28 22:25:17 Setting up stachg channel (id=2) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',2) 2005-02-28 22:25:17 ftok() returns: 0x20078B4 2005-02-28 22:25:17 shmget() returns: 0x8FD 2005-02-28 22:25:17 Setting up page channel (id=3) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',3) 2005-02-28 22:25:17 ftok() returns: 0x30078B4 2005-02-28 22:25:17 shmget() returns: 0x8FE 2005-02-28 22:25:17 Setting up data channel (id=4) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',4) 2005-02-28 22:25:17 ftok() returns: 0x40078B4 2005-02-28 22:25:17 shmget() returns: 0x8FF 2005-02-28 22:25:17 Setting up notes channel (id=5) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',5) 2005-02-28 22:25:17 ftok() returns: 0x50078B4 2005-02-28 22:25:17 shmget() returns: 0x900 2005-02-28 22:25:17 Setting up enadis channel (id=6) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',6) 2005-02-28 22:25:17 ftok() returns: 0x60078B4 2005-02-28 22:25:17 shmget() returns: 0xB59 2005-02-28 22:25:17 Setting up logfiles 2005-02-28 22:25:17 Task bbenadis terminated, status 1 2005-02-28 22:25:22 Task bbhistory started with PID 2740 2005-02-28 22:25:22 Task bbenadis started with PID 2743 2005-02-28 22:25:22 Task bbpage started with PID 2747 2005-02-28 22:25:22 Task larrdstatus started with PID 2750 2005-02-28 22:25:22 Task larrddata started with PID 2752
hobbitd.log
005-02-28 22:15:16 Posting message 12 to 1 readers 2005-02-28 22:15:16 Message posted 2005-02-28 22:15:16 Posting message 13 to 1 readers 2005-02-28 22:15:16 Message posted 2005-02-28 22:15:16 oldcolor=6, oldas=2, newcolor=0, newas=0 2005-02-28 22:15:16 posting to stachg channel 2005-02-28 22:15:17 Setup complete 2005-02-28 22:15:17 Sending heartbeat to pid 28397 2005-02-28 22:15:18 posting to status channel 2005-02-28 22:15:18 Dropping message - no readers 2005-02-28 22:15:19 posting to status channel 2005-02-28 22:15:19 Dropping message - no readers 2005-02-28 22:15:21 posting to status channel 2005-02-28 22:15:21 Dropping message - no readers 2005-02-28 22:15:22 Sending heartbeat to pid 28397 2005-02-28 22:15:22 posting to status channel 2005-02-28 22:15:22 Dropping message - no readers 2005-02-28 22:15:23 posting to status channel 2005-02-28 22:15:23 Posting message 1 to 1 readers 2005-02-28 22:15:23 Message posted 2005-02-28 22:15:24 posting to status channel 2005-02-28 22:15:24 Posting message 2 to 1 readers 2005-02-28 22:15:24 Message posted 2005-02-28 22:15:26 posting alert to page channel 2005-02-28 22:15:26 Posting message 1 to 1 readers 2005-02-28 22:15:26 Message posted 2005-02-28 22:15:26 posting to status channel 2005-02-28 22:15:26 Posting message 3 to 1 readers 2005-02-28 22:15:26 Message posted 2005-02-28 22:15:28 Sending heartbeat to pid 28397
On Mon, 28 Feb 2005 14:11:43 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Have it running with '--debug' now... Hopefully, this will yield more information.
- Brian
On Mon, 28 Feb 2005 23:10:14 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Mon, Feb 28, 2005 at 02:00:34PM -0800, Brian Lynch wrote:
It is also dumping core... here is the gdb results from the core file. [snip] (gdb) backtrace #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x0001cf54 in sigsegv_handler (signum=-14958584) at sig.c:57 #3 <signal handler called>
No more lines than that ? I'd expect a few more unless there's some serious memory corruption taking place.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Is it possible that the SIGUSR2 signal is not being sent properly? What signal is sent to kill hobbitd from hobbitlaunch if it doesn't recieve notice?
The HEARTBEAT keyword can only be used for one task, and is specifically aimed at monitoring the hobbitd(8) task. This task must send a SIGUSR2 signal to hobbitlaunch regularly; if this signal fails to arrive for more than 15 seconds, hobbitlaunch will kill the running task and start up a new one. hobbitd(8) will send this signal every 5 seconds.
On Mon, 28 Feb 2005 14:26:55 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Enabled the -DDEBUG on compile and enabled --debug on runtime for hobbitd. Same output from gdb..
#0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 (gdb) backtrace #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x000281f4 in sigsegv_handler (signum=0) at sig.c:57 #3 <signal handler called>
Here are the sections of each log:
hobbitlaunch.log
2005-02-28 22:24:56 Task bbhistory started with PID 2567 2005-02-28 22:24:56 Task bbenadis started with PID 2572 2005-02-28 22:24:56 Task bbpage started with PID 2575 2005-02-28 22:24:56 Task larrdstatus started with PID 2578 2005-02-28 22:24:56 Task larrddata started with PID 2580 2005-02-28 22:24:56 Task bbdisplay started with PID 2582 2005-02-28 22:24:56 Task bbnet started with PID 2584 2005-02-28 22:24:56 Task bbretest started with PID 2585 2005-02-28 22:24:58 Task larrdcolumn started with PID 2608 2005-02-28 22:24:58 Task infocolumn started with PID 2609 2005-02-28 22:25:16 Task hobbitd terminated by signal 6 2005-02-28 22:25:17 Loading hostnames 2005-02-28 22:25:17 Task hobbitd started with PID 2712 2005-02-28 22:25:17 Loading saved state 2005-02-28 22:25:17 Setting up network listener on 0.0.0.0:1984 2005-02-28 22:25:17 Setting up signal handlers 2005-02-28 22:25:17 Setting up hobbitd channels 2005-02-28 22:25:17 Setting up status channel (id=1) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',1) 2005-02-28 22:25:17 ftok() returns: 0x10078B4 2005-02-28 22:25:17 shmget() returns: 0x8FC 2005-02-28 22:25:17 Setting up stachg channel (id=2) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',2) 2005-02-28 22:25:17 ftok() returns: 0x20078B4 2005-02-28 22:25:17 shmget() returns: 0x8FD 2005-02-28 22:25:17 Setting up page channel (id=3) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',3) 2005-02-28 22:25:17 ftok() returns: 0x30078B4 2005-02-28 22:25:17 shmget() returns: 0x8FE 2005-02-28 22:25:17 Setting up data channel (id=4) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',4) 2005-02-28 22:25:17 ftok() returns: 0x40078B4 2005-02-28 22:25:17 shmget() returns: 0x8FF 2005-02-28 22:25:17 Setting up notes channel (id=5) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',5) 2005-02-28 22:25:17 ftok() returns: 0x50078B4 2005-02-28 22:25:17 shmget() returns: 0x900 2005-02-28 22:25:17 Setting up enadis channel (id=6) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',6) 2005-02-28 22:25:17 ftok() returns: 0x60078B4 2005-02-28 22:25:17 shmget() returns: 0xB59 2005-02-28 22:25:17 Setting up logfiles 2005-02-28 22:25:17 Task bbenadis terminated, status 1 2005-02-28 22:25:22 Task bbhistory started with PID 2740 2005-02-28 22:25:22 Task bbenadis started with PID 2743 2005-02-28 22:25:22 Task bbpage started with PID 2747 2005-02-28 22:25:22 Task larrdstatus started with PID 2750 2005-02-28 22:25:22 Task larrddata started with PID 2752
hobbitd.log
005-02-28 22:15:16 Posting message 12 to 1 readers 2005-02-28 22:15:16 Message posted 2005-02-28 22:15:16 Posting message 13 to 1 readers 2005-02-28 22:15:16 Message posted 2005-02-28 22:15:16 oldcolor=6, oldas=2, newcolor=0, newas=0 2005-02-28 22:15:16 posting to stachg channel 2005-02-28 22:15:17 Setup complete 2005-02-28 22:15:17 Sending heartbeat to pid 28397 2005-02-28 22:15:18 posting to status channel 2005-02-28 22:15:18 Dropping message - no readers 2005-02-28 22:15:19 posting to status channel 2005-02-28 22:15:19 Dropping message - no readers 2005-02-28 22:15:21 posting to status channel 2005-02-28 22:15:21 Dropping message - no readers 2005-02-28 22:15:22 Sending heartbeat to pid 28397 2005-02-28 22:15:22 posting to status channel 2005-02-28 22:15:22 Dropping message - no readers 2005-02-28 22:15:23 posting to status channel 2005-02-28 22:15:23 Posting message 1 to 1 readers 2005-02-28 22:15:23 Message posted 2005-02-28 22:15:24 posting to status channel 2005-02-28 22:15:24 Posting message 2 to 1 readers 2005-02-28 22:15:24 Message posted 2005-02-28 22:15:26 posting alert to page channel 2005-02-28 22:15:26 Posting message 1 to 1 readers 2005-02-28 22:15:26 Message posted 2005-02-28 22:15:26 posting to status channel 2005-02-28 22:15:26 Posting message 3 to 1 readers 2005-02-28 22:15:26 Message posted 2005-02-28 22:15:28 Sending heartbeat to pid 28397
On Mon, 28 Feb 2005 14:11:43 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Have it running with '--debug' now... Hopefully, this will yield more information.
- Brian
On Mon, 28 Feb 2005 23:10:14 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Mon, Feb 28, 2005 at 02:00:34PM -0800, Brian Lynch wrote:
It is also dumping core... here is the gdb results from the core file. [snip] (gdb) backtrace #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x0001cf54 in sigsegv_handler (signum=-14958584) at sig.c:57 #3 <signal handler called>
No more lines than that ? I'd expect a few more unless there's some serious memory corruption taking place.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Scratch that last message. Hobbitd appears to be dying independent of hobbitlaunch.
On Mon, 28 Feb 2005 14:41:26 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Is it possible that the SIGUSR2 signal is not being sent properly? What signal is sent to kill hobbitd from hobbitlaunch if it doesn't recieve notice?
The HEARTBEAT keyword can only be used for one task, and is specifically aimed at monitoring the hobbitd(8) task. This task must send a SIGUSR2 signal to hobbitlaunch regularly; if this signal fails to arrive for more than 15 seconds, hobbitlaunch will kill the running task and start up a new one. hobbitd(8) will send this signal every 5 seconds.
On Mon, 28 Feb 2005 14:26:55 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Enabled the -DDEBUG on compile and enabled --debug on runtime for hobbitd. Same output from gdb..
#0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 (gdb) backtrace #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x000281f4 in sigsegv_handler (signum=0) at sig.c:57 #3 <signal handler called>
Here are the sections of each log:
hobbitlaunch.log
2005-02-28 22:24:56 Task bbhistory started with PID 2567 2005-02-28 22:24:56 Task bbenadis started with PID 2572 2005-02-28 22:24:56 Task bbpage started with PID 2575 2005-02-28 22:24:56 Task larrdstatus started with PID 2578 2005-02-28 22:24:56 Task larrddata started with PID 2580 2005-02-28 22:24:56 Task bbdisplay started with PID 2582 2005-02-28 22:24:56 Task bbnet started with PID 2584 2005-02-28 22:24:56 Task bbretest started with PID 2585 2005-02-28 22:24:58 Task larrdcolumn started with PID 2608 2005-02-28 22:24:58 Task infocolumn started with PID 2609 2005-02-28 22:25:16 Task hobbitd terminated by signal 6 2005-02-28 22:25:17 Loading hostnames 2005-02-28 22:25:17 Task hobbitd started with PID 2712 2005-02-28 22:25:17 Loading saved state 2005-02-28 22:25:17 Setting up network listener on 0.0.0.0:1984 2005-02-28 22:25:17 Setting up signal handlers 2005-02-28 22:25:17 Setting up hobbitd channels 2005-02-28 22:25:17 Setting up status channel (id=1) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',1) 2005-02-28 22:25:17 ftok() returns: 0x10078B4 2005-02-28 22:25:17 shmget() returns: 0x8FC 2005-02-28 22:25:17 Setting up stachg channel (id=2) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',2) 2005-02-28 22:25:17 ftok() returns: 0x20078B4 2005-02-28 22:25:17 shmget() returns: 0x8FD 2005-02-28 22:25:17 Setting up page channel (id=3) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',3) 2005-02-28 22:25:17 ftok() returns: 0x30078B4 2005-02-28 22:25:17 shmget() returns: 0x8FE 2005-02-28 22:25:17 Setting up data channel (id=4) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',4) 2005-02-28 22:25:17 ftok() returns: 0x40078B4 2005-02-28 22:25:17 shmget() returns: 0x8FF 2005-02-28 22:25:17 Setting up notes channel (id=5) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',5) 2005-02-28 22:25:17 ftok() returns: 0x50078B4 2005-02-28 22:25:17 shmget() returns: 0x900 2005-02-28 22:25:17 Setting up enadis channel (id=6) 2005-02-28 22:25:17 calling ftok('/opt/hobbit/server',6) 2005-02-28 22:25:17 ftok() returns: 0x60078B4 2005-02-28 22:25:17 shmget() returns: 0xB59 2005-02-28 22:25:17 Setting up logfiles 2005-02-28 22:25:17 Task bbenadis terminated, status 1 2005-02-28 22:25:22 Task bbhistory started with PID 2740 2005-02-28 22:25:22 Task bbenadis started with PID 2743 2005-02-28 22:25:22 Task bbpage started with PID 2747 2005-02-28 22:25:22 Task larrdstatus started with PID 2750 2005-02-28 22:25:22 Task larrddata started with PID 2752
hobbitd.log
005-02-28 22:15:16 Posting message 12 to 1 readers 2005-02-28 22:15:16 Message posted 2005-02-28 22:15:16 Posting message 13 to 1 readers 2005-02-28 22:15:16 Message posted 2005-02-28 22:15:16 oldcolor=6, oldas=2, newcolor=0, newas=0 2005-02-28 22:15:16 posting to stachg channel 2005-02-28 22:15:17 Setup complete 2005-02-28 22:15:17 Sending heartbeat to pid 28397 2005-02-28 22:15:18 posting to status channel 2005-02-28 22:15:18 Dropping message - no readers 2005-02-28 22:15:19 posting to status channel 2005-02-28 22:15:19 Dropping message - no readers 2005-02-28 22:15:21 posting to status channel 2005-02-28 22:15:21 Dropping message - no readers 2005-02-28 22:15:22 Sending heartbeat to pid 28397 2005-02-28 22:15:22 posting to status channel 2005-02-28 22:15:22 Dropping message - no readers 2005-02-28 22:15:23 posting to status channel 2005-02-28 22:15:23 Posting message 1 to 1 readers 2005-02-28 22:15:23 Message posted 2005-02-28 22:15:24 posting to status channel 2005-02-28 22:15:24 Posting message 2 to 1 readers 2005-02-28 22:15:24 Message posted 2005-02-28 22:15:26 posting alert to page channel 2005-02-28 22:15:26 Posting message 1 to 1 readers 2005-02-28 22:15:26 Message posted 2005-02-28 22:15:26 posting to status channel 2005-02-28 22:15:26 Posting message 3 to 1 readers 2005-02-28 22:15:26 Message posted 2005-02-28 22:15:28 Sending heartbeat to pid 28397
On Mon, 28 Feb 2005 14:11:43 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Have it running with '--debug' now... Hopefully, this will yield more information.
- Brian
On Mon, 28 Feb 2005 23:10:14 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Mon, Feb 28, 2005 at 02:00:34PM -0800, Brian Lynch wrote:
It is also dumping core... here is the gdb results from the core file. [snip] (gdb) backtrace #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x0001cf54 in sigsegv_handler (signum=-14958584) at sig.c:57 #3 <signal handler called>
No more lines than that ? I'd expect a few more unless there's some serious memory corruption taking place.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Mon, Feb 28, 2005 at 02:26:55PM -0800, Brian Lynch wrote:
[snipped]
Not much to go by in those traces, really.
I noticed in the first report you sent, that it seems as if hobbitd crashes about 5 minutes after it is started. This would coincide with the time when it sends in a status report about itself. Can you verify if this is correct - what is the interval between the "Task hobbitd started" messages ?
If it's 15 minutes, then it points more in the direction of the checkpoint code, and the errors you showed initially "Too few fields in record - found 1, expected 17" also point in that direction.
You could also try enabling the memory-debug code: In lib/memory.h, change the line
#undef MEMORY_DEBUG to #define MEMORY_DEBUG 1
Then do a "make allclean; make; make install" and restart hobbit. It will still crash, but hopefully a little earlier before it smashes the stack - so gdb will hopefully give some more info in the backtrace.
Regards, Henrik
Important correction:
On Mon, Feb 28, 2005 at 11:59:18PM +0100, Henrik Stoerner wrote:
Then do a "make allclean; make; make install"
Dont do the "make install", just copy hobbitd/hobbitd over to ~hobbit/server/bin/ and restart it.
Henrik
Tried the compile with MEMORY_DEBUG and got this stacktrace from the core dump:
bash-2.03$ gdb bin/hobbitd core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.8"... Core was generated by `hobbitd --debug --restart=/opt/hobbit/server/tmp/hobbitd.chk --checkpoint-file='. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1 #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 (gdb) bt #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x000270b8 in xsprintf (dest=0x2e378 "xsprintf: Bogus destination\n", fmt=0x0) at memory.c:334 #3 0x00013ad4 in posttochannel (channel=0x2e378, channelmarker=0x0, msg=0x0, sender=0x0, hostname=0x3f2e8 "", log=0x0, readymsg=0x0) at hobbitd.c:436 (gdb)
On Tue, 1 Mar 2005 00:03:57 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
Important correction:
On Mon, Feb 28, 2005 at 11:59:18PM +0100, Henrik Stoerner wrote:
Then do a "make allclean; make; make install"
Dont do the "make install", just copy hobbitd/hobbitd over to ~hobbit/server/bin/ and restart it.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Could this be a problem with the version of gcc or other package? Note the version below:
Reading specs from /usr/local/gcc-3.3.1/lib/gcc-lib/sparc-sun-solaris2.8/3.3.1/specs Configured with: ../configure --prefix=/usr/local/gcc-3.3.1 Thread model: posix gcc version 3.3.1
On Mon, 28 Feb 2005 15:13:05 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Tried the compile with MEMORY_DEBUG and got this stacktrace from the core dump:
bash-2.03$ gdb bin/hobbitd core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.8"... Core was generated by `hobbitd --debug --restart=/opt/hobbit/server/tmp/hobbitd.chk --checkpoint-file='. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1 #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 (gdb) bt #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x000270b8 in xsprintf (dest=0x2e378 "xsprintf: Bogus destination\n", fmt=0x0) at memory.c:334 #3 0x00013ad4 in posttochannel (channel=0x2e378, channelmarker=0x0, msg=0x0, sender=0x0, hostname=0x3f2e8 "", log=0x0, readymsg=0x0) at hobbitd.c:436 (gdb)
On Tue, 1 Mar 2005 00:03:57 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
Important correction:
On Mon, Feb 28, 2005 at 11:59:18PM +0100, Henrik Stoerner wrote:
Then do a "make allclean; make; make install"
Dont do the "make install", just copy hobbitd/hobbitd over to ~hobbit/server/bin/ and restart it.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Tried gcc 3.4.2 with the same result:
2005-03-01 06:30:16 Task hobbitd terminated by signal 6 2005-03-01 06:30:16 Loading hostnames 2005-03-01 06:30:16 Loading saved state 2005-03-01 06:30:16 Cannot access checkpoint file /opt/hobbit/server/tmp/hobbitd.chk for restore 2005-03-01 06:30:16 Setting up network listener on 0.0.0.0:1984 2005-03-01 06:30:16 Setting up signal handlers 2005-03-01 06:30:16 Setting up hobbitd channels 2005-03-01 06:30:16 Setting up status channel (id=1) 2005-03-01 06:30:16 calling ftok('/opt/hobbit/server',1) 2005-03-01 06:30:16 ftok() returns: 0x10078B4 2005-03-01 06:30:16 shmget() returns: 0xA8C 2005-03-01 06:30:16 Setting up stachg channel (id=2) 2005-03-01 06:30:16 calling ftok('/opt/hobbit/server',2) 2005-03-01 06:30:16 ftok() returns: 0x20078B4 2005-03-01 06:30:16 shmget() returns: 0xA8D 2005-03-01 06:30:16 Setting up page channel (id=3) 2005-03-01 06:30:16 calling ftok('/opt/hobbit/server',3) 2005-03-01 06:30:16 ftok() returns: 0x30078B4 2005-03-01 06:30:16 shmget() returns: 0xA8E 2005-03-01 06:30:16 Setting up data channel (id=4) 2005-03-01 06:30:16 calling ftok('/opt/hobbit/server',4) 2005-03-01 06:30:16 ftok() returns: 0x40078B4 2005-03-01 06:30:16 shmget() returns: 0xA8F 2005-03-01 06:30:16 Setting up notes channel (id=5) 2005-03-01 06:30:16 calling ftok('/opt/hobbit/server',5) 2005-03-01 06:30:16 ftok() returns: 0x50078B4 2005-03-01 06:30:16 shmget() returns: 0xA2C 2005-03-01 06:30:16 Setting up enadis channel (id=6) 2005-03-01 06:30:16 calling ftok('/opt/hobbit/server',6) 2005-03-01 06:30:16 ftok() returns: 0x60078B4 2005-03-01 06:30:16 shmget() returns: 0x10D1 2005-03-01 06:30:16 Setting up logfiles 2005-03-01 06:30:16 Task hobbitd started with PID 3579 2005-03-01 06:30:16 Task bbenadis terminated, status 1 2005-03-01 06:30:21 Task bbhistory started with PID 3605 2005-03-01 06:30:21 Task bbenadis started with PID 3607 2005-03-01 06:30:21 Task bbpage started with PID 3609 2005-03-01 06:30:21 Task larrdstatus started with PID 3611 2005-03-01 06:30:21 Task larrddata started with PID 3613 2005-03-01 06:30:31 Task bbdisplay started with PID 3666 2005-03-01 06:30:31 Task bbretest started with PID 3668
On Mon, 28 Feb 2005 19:39:05 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Could this be a problem with the version of gcc or other package? Note the version below:
Reading specs from /usr/local/gcc-3.3.1/lib/gcc-lib/sparc-sun-solaris2.8/3.3.1/specs Configured with: ../configure --prefix=/usr/local/gcc-3.3.1 Thread model: posix gcc version 3.3.1
On Mon, 28 Feb 2005 15:13:05 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Tried the compile with MEMORY_DEBUG and got this stacktrace from the core dump:
bash-2.03$ gdb bin/hobbitd core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.8"... Core was generated by `hobbitd --debug --restart=/opt/hobbit/server/tmp/hobbitd.chk --checkpoint-file='. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1 #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 (gdb) bt #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x000270b8 in xsprintf (dest=0x2e378 "xsprintf: Bogus destination\n", fmt=0x0) at memory.c:334 #3 0x00013ad4 in posttochannel (channel=0x2e378, channelmarker=0x0, msg=0x0, sender=0x0, hostname=0x3f2e8 "", log=0x0, readymsg=0x0) at hobbitd.c:436 (gdb)
On Tue, 1 Mar 2005 00:03:57 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
Important correction:
On Mon, Feb 28, 2005 at 11:59:18PM +0100, Henrik Stoerner wrote:
Then do a "make allclean; make; make install"
Dont do the "make install", just copy hobbitd/hobbitd over to ~hobbit/server/bin/ and restart it.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Note that this seems to happen after multiple selections of the bb-hostsvc.sh script.. as opposed to every 15 minutes..
- Brian
On Mon, 28 Feb 2005 22:31:32 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Tried gcc 3.4.2 with the same result:
2005-03-01 06:30:16 Task hobbitd terminated by signal 6 2005-03-01 06:30:16 Loading hostnames 2005-03-01 06:30:16 Loading saved state 2005-03-01 06:30:16 Cannot access checkpoint file /opt/hobbit/server/tmp/hobbitd.chk for restore 2005-03-01 06:30:16 Setting up network listener on 0.0.0.0:1984 2005-03-01 06:30:16 Setting up signal handlers 2005-03-01 06:30:16 Setting up hobbitd channels 2005-03-01 06:30:16 Setting up status channel (id=1) 2005-03-01 06:30:16 calling ftok('/opt/hobbit/server',1) 2005-03-01 06:30:16 ftok() returns: 0x10078B4 2005-03-01 06:30:16 shmget() returns: 0xA8C 2005-03-01 06:30:16 Setting up stachg channel (id=2) 2005-03-01 06:30:16 calling ftok('/opt/hobbit/server',2) 2005-03-01 06:30:16 ftok() returns: 0x20078B4 2005-03-01 06:30:16 shmget() returns: 0xA8D 2005-03-01 06:30:16 Setting up page channel (id=3) 2005-03-01 06:30:16 calling ftok('/opt/hobbit/server',3) 2005-03-01 06:30:16 ftok() returns: 0x30078B4 2005-03-01 06:30:16 shmget() returns: 0xA8E 2005-03-01 06:30:16 Setting up data channel (id=4) 2005-03-01 06:30:16 calling ftok('/opt/hobbit/server',4) 2005-03-01 06:30:16 ftok() returns: 0x40078B4 2005-03-01 06:30:16 shmget() returns: 0xA8F 2005-03-01 06:30:16 Setting up notes channel (id=5) 2005-03-01 06:30:16 calling ftok('/opt/hobbit/server',5) 2005-03-01 06:30:16 ftok() returns: 0x50078B4 2005-03-01 06:30:16 shmget() returns: 0xA2C 2005-03-01 06:30:16 Setting up enadis channel (id=6) 2005-03-01 06:30:16 calling ftok('/opt/hobbit/server',6) 2005-03-01 06:30:16 ftok() returns: 0x60078B4 2005-03-01 06:30:16 shmget() returns: 0x10D1 2005-03-01 06:30:16 Setting up logfiles 2005-03-01 06:30:16 Task hobbitd started with PID 3579 2005-03-01 06:30:16 Task bbenadis terminated, status 1 2005-03-01 06:30:21 Task bbhistory started with PID 3605 2005-03-01 06:30:21 Task bbenadis started with PID 3607 2005-03-01 06:30:21 Task bbpage started with PID 3609 2005-03-01 06:30:21 Task larrdstatus started with PID 3611 2005-03-01 06:30:21 Task larrddata started with PID 3613 2005-03-01 06:30:31 Task bbdisplay started with PID 3666 2005-03-01 06:30:31 Task bbretest started with PID 3668
On Mon, 28 Feb 2005 19:39:05 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Could this be a problem with the version of gcc or other package? Note the version below:
Reading specs from /usr/local/gcc-3.3.1/lib/gcc-lib/sparc-sun-solaris2.8/3.3.1/specs Configured with: ../configure --prefix=/usr/local/gcc-3.3.1 Thread model: posix gcc version 3.3.1
On Mon, 28 Feb 2005 15:13:05 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Tried the compile with MEMORY_DEBUG and got this stacktrace from the core dump:
bash-2.03$ gdb bin/hobbitd core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.8"... Core was generated by `hobbitd --debug --restart=/opt/hobbit/server/tmp/hobbitd.chk --checkpoint-file='. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1 #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 (gdb) bt #0 0xff19fc14 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x000270b8 in xsprintf (dest=0x2e378 "xsprintf: Bogus destination\n", fmt=0x0) at memory.c:334 #3 0x00013ad4 in posttochannel (channel=0x2e378, channelmarker=0x0, msg=0x0, sender=0x0, hostname=0x3f2e8 "", log=0x0, readymsg=0x0) at hobbitd.c:436 (gdb)
On Tue, 1 Mar 2005 00:03:57 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
Important correction:
On Mon, Feb 28, 2005 at 11:59:18PM +0100, Henrik Stoerner wrote:
Then do a "make allclean; make; make install"
Dont do the "make install", just copy hobbitd/hobbitd over to ~hobbit/server/bin/ and restart it.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Mon, Feb 28, 2005 at 03:13:05PM -0800, Brian Lynch wrote:
Tried the compile with MEMORY_DEBUG and got this stacktrace from the core dump:
bash-2.03$ gdb bin/hobbitd core GNU gdb 6.0 Program terminated with signal 6, Aborted. #1 0xff13598c in abort () from /usr/lib/libc.so.1 #2 0x000270b8 in xsprintf (dest=0x2e378 "xsprintf: Bogus destination\n", fmt=0x0) at memory.c:334 #3 0x00013ad4 in posttochannel (channel=0x2e378, channelmarker=0x0, msg=0x0, sender=0x0, hostname=0x3f2e8 "", log=0x0, readymsg=0x0) at hobbitd.c:436 (gdb)
This looks really weird.
Line 436 of hobbitd.c can only be reached if "readymsg" is not NULL. But the call trace shows it as such.
The posttochannel() routine is always called with a message and a sender. Yet both of those parameters show up as NULL in the call trace.
I'd like to have a copy of this core-dump, the ~hobbit/server/bin/hobbitd binary, your entire /var/log/hobbit/ directory, the ~hobbit/server/tmp/hobbitd.chk file, and your ~hobbit/server/etc/bb-hosts and hobbitserver.cfg files.
Yes, this includes a bunch of info about your locally used ip-adresses and hostnames. Check with your boss if it is ok to hand them out.
Is anyone else seeing repeated crashes of hobbitd ? I haven't heard of any - quite the contrary - so I'm wondering why this happens so much in just your installation.
Regards, Henrik
I've found something that might explain it, but I am certainly not sure if it's the cause of your problems. This bug would only trigger if some client of yours was sending in very large "data" messages, like 100 KB or more.
Could you try the attached patch and let me know if it solves your problem?
If not, I would like you to try and disable all of the hobbitd worker modules in hobbitlaunch.cfg - i.e. bbstatus, bbhistory, bbdata, bbnotes, bbenadies, bbpage, larrdstatus and larrddata. Restart hobbit and see if it still crashes. If it doesn't then we've narrowed down the problem - if it does, then the problem is somewhere else than where I've been looking so far.
Thanks, Henrik
Henrik,
Tried the patch and hobbitd still crashed.
Tried the disable module test. Disabled all the modules you mentioned above and hobbitd has not crashed since 8:30 PST this morning. I'll try adding back in modules until it crashes.
- Brian
On Tue, 1 Mar 2005 09:33:51 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
I've found something that might explain it, but I am certainly not sure if it's the cause of your problems. This bug would only trigger if some client of yours was sending in very large "data" messages, like 100 KB or more.
Could you try the attached patch and let me know if it solves your problem?
If not, I would like you to try and disable all of the hobbitd worker modules in hobbitlaunch.cfg - i.e. bbstatus, bbhistory, bbdata, bbnotes, bbenadies, bbpage, larrdstatus and larrddata. Restart hobbit and see if it still crashes. If it doesn't then we've narrowed down the problem - if it does, then the problem is somewhere else than where I've been looking so far.
Thanks, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
OK... I enabled the modules one at a time until I reached 'larrdstatus'. When I enabled that module, hobbitd began crashing again. Here is the contents of the hobbitlaunch.cfg.
The hobbittasks.cfg file is loaded by "hobbitlaunch".
It controls which of the Hobbit modules to run, how often, and
with which parameters, options and environment variables. #
This is the main Hobbit daemon. You cannot live without this one. [hobbitd]
HEARTBEAT
ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg
CMD hobbitd --debug --restart=$BBTMP/hobbitd.chk
--checkpoint-file=$BBTMP/hobbitd.chk --checkpoint-interval=600 --log=$BBSERVERLOGS/hobbitd.log --admin-senders=127.0.0.1,$BBSERVERIP
"bbstatus" saves status-logs in text- and html-format, like the old
Big Brother
daemon does. Unless you are using add-ons that directly access the
log-files, you
will not need to run this module, and it is recommended that you
keep it disabled # since storing the raw logs on disk can cause a significant load on your server.
[bbstatus] DISABLED ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=status --log=$BBSERVERLOGS/status.log hobbitd_filestore --status --html
"bbhistory" keeps track of the status changes that happen, in a
manner that is # compatible with the Big Brother history logs. You probably do want to run this.
[bbhistory] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=stachg --log=$BBSERVERLOGS/history.log hobbitd_history
"bbdata" saves information sent using the BB "data" protocol, like
the old Big Brother
daemon does. Unless you are using add-ons that directly access the
data-files, you
will not need to run this module, and it is recommended that you
keep it disabled # since storing the raw data on disk can cause a significant load on your BB server. # LARRD uses data-files, but since LARRD is handled by the hobbitd_larrd module, you do # not need to run the "bbdata" module to get LARRD graphs.
[bbdata] DISABLED ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=data --log=$BBSERVERLOGS/data.log hobbitd_filestore --data
"bbnotes" saves web note-files, that are sent using the BB "notes"
protocol. This is # disabled by default; if you use the BB "notes" protocol, then you should enable this.
[bbnotes] DISABLED ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=notes --log=$BBSERVERLOGS/notes.log hobbitd_filestore --notes
"bbenadis" updates the files used to indicate that a host or test
has been enabled or disabled. # These files are used by bbgen and the "maint.pl" script to determine what is currently enabled # and disabled, so you probably want to run this module.
[bbenadis] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=enadis --log=$BBSERVERLOGS/enadis.log hobbitd_filestore --enadis
"bbpage" sends out alerts. Note that this module is NOT compatible
with the old Big Brother # system - it uses a different configuration file to determine how alerts get sent. If you want # alerts to go out via pager, e-mail or some other means, then you must run this module.
[bbpage] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=page --log=$BBSERVERLOGS/page.log hobbitd_alert
"larrdstatus" updates RRD files with information that arrives as
"status" messages. # If you want RRD graphs of your monitoring data, then you want to run this.
[larrdstatus] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=status --log=$BBSERVERLOGS/larrd-status.log hobbitd_larrd --rrddir=$BBVAR/rrd
"larrddata" updates RRD files with information that arrives as
"data" messages. # If you want RRD graphs of your monitoring BB data, then you want to run this.
[larrddata]
DISABLED
ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg
NEEDS hobbitd
CMD hobbitd_channel --channel=data
--log=$BBSERVERLOGS/larrd-data.log hobbitd_larrd --rrddir=$BBVAR/rrd
"bbdisplay" runs the bbgen tool to generate the Hobbit webpages from
the status information that # has been received. Big Brother updated the webpages once every 5 minutes. The default here is to # run it every minute for faster updates, but you can change it if you have a highly loaded server # and dont need updates that often.
[bbdisplay] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd GROUP generators CMD bb-display.sh LOGFILE $BBSERVERLOGS/bb-display.log INTERVAL 1m
"larrdcolumn" is responsible for updating the contents of the LARRD
overview page, found on the # "trends" column for each host. Since the set of graphs does not change very often, we run this # less frequently than the normal webpage updates. We also make sure (with the "GROUP" setting) # that they don't run simultaneously with the infocolumnn and bbdisplay tasks. [larrdcolumn] DISABLED ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd GROUP generators CMD bb-larrdcolumn --hobbitd --rrddir=$BBVAR/rrd --column=trends LOGFILE $BBSERVERLOGS/bb-display.log INTERVAL 15m
"infocolumn" is responsible for updating the contents of the INFO
pages, found on the # "info" column for each host. Since the content does not change unless there is a
configuration change, we update these less frequently than the
normal webpage updates. # We also make sure (with the "GROUP" setting) that they don't run simultaneously with
the larrdcolumnn and bbdisplay tasks.
[infocolumn] DISABLED ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd GROUP generators CMD bb-infocolumn --hobbitd --column=info LOGFILE $BBSERVERLOGS/bb-display.log INTERVAL 15m
"bbnet" runs the bbtest-net tool to perform the network based tests
- i.e. http, smtp, ssh, dns and # all of the various network protocols we need to test.
[bbnet] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD bbtest-net --report --ping --checkresponse LOGFILE $BBSERVERLOGS/bb-network.log INTERVAL 5m
"bbretest" picks up the tests that the normal network test consider
"failed", and re-does those # tests more often. This enables Big Brother to pick up a recovered network service faster than # if it were tested only by the "bbnet" task (which only runs every 5 minutes). So if you have # servers with very high availability guarantees, running this task will make your availability # reports look much better.
[bbretest] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/bbretest-net.sh LOGFILE $BBSERVERLOGS/bb-retest.log INTERVAL 1m
On Tue, 1 Mar 2005 10:10:12 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Henrik,
Tried the patch and hobbitd still crashed.
Tried the disable module test. Disabled all the modules you mentioned above and hobbitd has not crashed since 8:30 PST this morning. I'll try adding back in modules until it crashes.
- Brian
On Tue, 1 Mar 2005 09:33:51 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
I've found something that might explain it, but I am certainly not sure if it's the cause of your problems. This bug would only trigger if some client of yours was sending in very large "data" messages, like 100 KB or more.
Could you try the attached patch and let me know if it solves your problem?
If not, I would like you to try and disable all of the hobbitd worker modules in hobbitlaunch.cfg - i.e. bbstatus, bbhistory, bbdata, bbnotes, bbenadies, bbpage, larrdstatus and larrddata. Restart hobbit and see if it still crashes. If it doesn't then we've narrowed down the problem - if it does, then the problem is somewhere else than where I've been looking so far.
Thanks, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Looks like 'larrddata' is clean. So far, only 'larrdstatus' is causing it to crash.
- Brian
On Tue, 1 Mar 2005 10:48:01 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
OK... I enabled the modules one at a time until I reached 'larrdstatus'. When I enabled that module, hobbitd began crashing again. Here is the contents of the hobbitlaunch.cfg.
The hobbittasks.cfg file is loaded by "hobbitlaunch".
It controls which of the Hobbit modules to run, how often, and
with which parameters, options and environment variables. #
This is the main Hobbit daemon. You cannot live without this one. [hobbitd]
HEARTBEAT ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg CMD hobbitd --debug --restart=$BBTMP/hobbitd.chk--checkpoint-file=$BBTMP/hobbitd.chk --checkpoint-interval=600 --log=$BBSERVERLOGS/hobbitd.log --admin-senders=127.0.0.1,$BBSERVERIP
"bbstatus" saves status-logs in text- and html-format, like the old
Big Brother
daemon does. Unless you are using add-ons that directly access the
log-files, you
will not need to run this module, and it is recommended that you
keep it disabled # since storing the raw logs on disk can cause a significant load on your server.
[bbstatus] DISABLED ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=status --log=$BBSERVERLOGS/status.log hobbitd_filestore --status --html
"bbhistory" keeps track of the status changes that happen, in a
manner that is # compatible with the Big Brother history logs. You probably do want to run this.
[bbhistory] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=stachg --log=$BBSERVERLOGS/history.log hobbitd_history
"bbdata" saves information sent using the BB "data" protocol, like
the old Big Brother
daemon does. Unless you are using add-ons that directly access the
data-files, you
will not need to run this module, and it is recommended that you
keep it disabled # since storing the raw data on disk can cause a significant load on your BB server. # LARRD uses data-files, but since LARRD is handled by the hobbitd_larrd module, you do # not need to run the "bbdata" module to get LARRD graphs.
[bbdata] DISABLED ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=data --log=$BBSERVERLOGS/data.log hobbitd_filestore --data
"bbnotes" saves web note-files, that are sent using the BB "notes"
protocol. This is # disabled by default; if you use the BB "notes" protocol, then you should enable this.
[bbnotes] DISABLED ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=notes --log=$BBSERVERLOGS/notes.log hobbitd_filestore --notes
"bbenadis" updates the files used to indicate that a host or test
has been enabled or disabled. # These files are used by bbgen and the "maint.pl" script to determine what is currently enabled # and disabled, so you probably want to run this module.
[bbenadis] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=enadis --log=$BBSERVERLOGS/enadis.log hobbitd_filestore --enadis
"bbpage" sends out alerts. Note that this module is NOT compatible
with the old Big Brother # system - it uses a different configuration file to determine how alerts get sent. If you want # alerts to go out via pager, e-mail or some other means, then you must run this module.
[bbpage] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=page --log=$BBSERVERLOGS/page.log hobbitd_alert
"larrdstatus" updates RRD files with information that arrives as
"status" messages. # If you want RRD graphs of your monitoring data, then you want to run this.
[larrdstatus] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=status --log=$BBSERVERLOGS/larrd-status.log hobbitd_larrd --rrddir=$BBVAR/rrd
"larrddata" updates RRD files with information that arrives as
"data" messages. # If you want RRD graphs of your monitoring BB data, then you want to run this.
[larrddata] DISABLED ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD hobbitd_channel --channel=data --log=$BBSERVERLOGS/larrd-data.log hobbitd_larrd --rrddir=$BBVAR/rrd
"bbdisplay" runs the bbgen tool to generate the Hobbit webpages from
the status information that # has been received. Big Brother updated the webpages once every 5 minutes. The default here is to # run it every minute for faster updates, but you can change it if you have a highly loaded server # and dont need updates that often.
[bbdisplay] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd GROUP generators CMD bb-display.sh LOGFILE $BBSERVERLOGS/bb-display.log INTERVAL 1m
"larrdcolumn" is responsible for updating the contents of the LARRD
overview page, found on the # "trends" column for each host. Since the set of graphs does not change very often, we run this # less frequently than the normal webpage updates. We also make sure (with the "GROUP" setting) # that they don't run simultaneously with the infocolumnn and bbdisplay tasks. [larrdcolumn] DISABLED ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd GROUP generators CMD bb-larrdcolumn --hobbitd --rrddir=$BBVAR/rrd --column=trends LOGFILE $BBSERVERLOGS/bb-display.log INTERVAL 15m
"infocolumn" is responsible for updating the contents of the INFO
pages, found on the # "info" column for each host. Since the content does not change unless there is a
configuration change, we update these less frequently than the
normal webpage updates. # We also make sure (with the "GROUP" setting) that they don't run simultaneously with
the larrdcolumnn and bbdisplay tasks.
[infocolumn] DISABLED ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd GROUP generators CMD bb-infocolumn --hobbitd --column=info LOGFILE $BBSERVERLOGS/bb-display.log INTERVAL 15m
"bbnet" runs the bbtest-net tool to perform the network based tests
- i.e. http, smtp, ssh, dns and # all of the various network protocols we need to test.
[bbnet] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD bbtest-net --report --ping --checkresponse LOGFILE $BBSERVERLOGS/bb-network.log INTERVAL 5m
"bbretest" picks up the tests that the normal network test consider
"failed", and re-does those # tests more often. This enables Big Brother to pick up a recovered network service faster than # if it were tested only by the "bbnet" task (which only runs every 5 minutes). So if you have # servers with very high availability guarantees, running this task will make your availability # reports look much better.
[bbretest] ENVFILE /opt/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/bbretest-net.sh LOGFILE $BBSERVERLOGS/bb-retest.log INTERVAL 1m
On Tue, 1 Mar 2005 10:10:12 -0800, Brian Lynch <brianlynch at gmail.com> wrote:
Henrik,
Tried the patch and hobbitd still crashed.
Tried the disable module test. Disabled all the modules you mentioned above and hobbitd has not crashed since 8:30 PST this morning. I'll try adding back in modules until it crashes.
- Brian
On Tue, 1 Mar 2005 09:33:51 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
I've found something that might explain it, but I am certainly not sure if it's the cause of your problems. This bug would only trigger if some client of yours was sending in very large "data" messages, like 100 KB or more.
Could you try the attached patch and let me know if it solves your problem?
If not, I would like you to try and disable all of the hobbitd worker modules in hobbitlaunch.cfg - i.e. bbstatus, bbhistory, bbdata, bbnotes, bbenadies, bbpage, larrdstatus and larrddata. Restart hobbit and see if it still crashes. If it doesn't then we've narrowed down the problem - if it does, then the problem is somewhere else than where I've been looking so far.
Thanks, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Tue, Mar 01, 2005 at 10:48:01AM -0800, Brian Lynch wrote:
OK... I enabled the modules one at a time until I reached 'larrdstatus'. When I enabled that module, hobbitd began crashing again.
OK, that is fairly interesting.
I do have some more code that I'd like you to test, but I need to get it into a state where I'm sure that all the debugging code actually works. Not exactly rocket science, I just need to go over it tonight.
So I'd like to send you a special test version of Hobbit - hopefully that will provide some clue as to what exactly is happening.
Have you been running a previous version of Hobbit without any problems ?
Regards, Henrik
Henrik, I confess, I'm new to the Hobbit project. I've been running bbgen for the past 2 years with success and just decided to test drive your new server. I'd be happy to run a test version when you're ready. Here's the update on disabling modules... I re-enabled all modules except for the ones below. This means all of the defaults are enabled except for 'larrdstatus'. This runs fine with no dumps. Adding larrdstatus back in causes the signal 6 dump for hobbitd.
DISABLED MODULES bbstatus bbdata bbnotes larrdstatus
- Brian
On Tue, 1 Mar 2005 20:31:11 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Tue, Mar 01, 2005 at 10:48:01AM -0800, Brian Lynch wrote:
OK... I enabled the modules one at a time until I reached 'larrdstatus'. When I enabled that module, hobbitd began crashing again.
OK, that is fairly interesting.
I do have some more code that I'd like you to test, but I need to get it into a state where I'm sure that all the debugging code actually works. Not exactly rocket science, I just need to go over it tonight.
So I'd like to send you a special test version of Hobbit - hopefully that will provide some clue as to what exactly is happening.
Have you been running a previous version of Hobbit without any problems ?
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
participants (2)
-
brianlynch@gmail.com
-
henrik@hswn.dk