Problems since upgrading from bbgen to Hobbit
Hi Henrik
I upgraded this morning from bbgen 3.6 to Hobbit 4.1.2p1 (with ~ 1000 hosts), and have a few problems :
- I can't display the history of one of my test :
http://10.50.80.44/hobbit-cgi/bb-hist.sh?HISTFILE=cronos.AHD&ENTRIES=50&IP=1... returns an internal server error
[Tue Nov 29 12:08:08 2005] [error] [client 10.50.8.55] Premature end of script headers: bb-hist.sh, referer: http://10.50.80.44/hobbit-cgi/bb-hostsvc.sh?HOSTSVC=cronos.AHD&IP=10.50.80.4...
- I had a coredump with bbgen :
$ file core core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, SVR4-style, from 'bbgen'
$ gdb /BB/hobbit/server/bin/bbgen core GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `bbgen --recentgifs --subpagecolumns=2 --nopropred=AHD --subpagecolumns=2 --page'. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libpcre.so.0...done. Loaded symbols for /usr/lib/libpcre.so.0 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x40075941 in kill () from /lib/libc.so.6 (gdb) bt #0 0x40075941 in kill () from /lib/libc.so.6 #1 0x400756e5 in raise () from /lib/libc.so.6 #2 0x40076a86 in abort () from /lib/libc.so.6 #3 0x080639d1 in sigsegv_handler (signum=0) at sig.c:57 #4 <signal handler called> #5 main (argc=11, argv=0xbfffd7a4) at bbgen.c:586 (gdb) quit
some of my devices (running the Quest BB client 3.01) do not update their statuses as frequently as they should Even if I add a process to check, it doesn't appear in Hobbit. Could it be a compatibility problem (my tests with a Quest BB client 3.01 running on XP SP2 ran fine).
one of my custom network test keeps getting "Unexpected service response" Its definition is this : [ica] expect "ICA" port 1494
I get a lot of errors in page.log : 2005-11-29 12:38:03 Bad timespec (missing colon or wrong weekdays): 1-6:0000:0700
Do I have to use "TIME=123456:0000:0700" ?
Many thanks in advance for your help.
Regards,
--
Frédéric Mangeant
Steria EDC Sophia-Antipolis
On Tue, Nov 29, 2005 at 12:49:44PM +0100, Frédéric Mangeant wrote:
- I can't display the history of one of my test :
http://10.50.80.44/hobbit-cgi/bb-hist.sh?HISTFILE=cronos.AHD&ENTRIES=50&IP=1... returns an internal server error
[Tue Nov 29 12:08:08 2005] [error] [client 10.50.8.55] Premature end of script headers: bb-hist.sh
Most likely there is some sort of malformed entry in that history file which the Hobbit histlog CGI cannot handle. If you could send me that file - should be ~hobbit/data/hist/cronos.AHD - I can look into it.
- I had a coredump with bbgen : #3 0x080639d1 in sigsegv_handler (signum=0) at sig.c:57 #4 <signal handler called> #5 main (argc=11, argv=0xbfffd7a4) at bbgen.c:586
Weird. Does it happen every time you run bbgen ? I hope not.
- some of my devices (running the Quest BB client 3.01) do not update their statuses as frequently as they should Even if I add a process to check, it doesn't appear in Hobbit. Could it be a compatibility problem (my tests with a Quest BB client 3.01 running on XP SP2 ran fine).
Hard to tell ... my best suggestion is to capture a network trace of the traffic between one of these clients and the Hobbit server. If a Linux box, you can use tcpdump -s 1500 -w capturefile tcp port 1984 and host CLIENTHOSTIP to grab only the traffic between Hobbit and this client.
- one of my custom network test keeps getting "Unexpected service response" Its definition is this : [ica] expect "ICA" port 1494
Most likely, it isn't getting any response back. What does "telnet HOSTNAME 1494" give you ?
- I get a lot of errors in page.log : 2005-11-29 12:38:03 Bad timespec (missing colon or wrong weekdays): 1-6:0000:0700
Do I have to use "TIME=123456:0000:0700" ?
I think so, yes.
Regards, Henrik
Henrik Stoerner a écrit :
On Tue, Nov 29, 2005 at 12:49:44PM +0100, Frédéric Mangeant wrote:
- I can't display the history of one of my test :
http://10.50.80.44/hobbit-cgi/bb-hist.sh?HISTFILE=cronos.AHD&ENTRIES=50&IP=1... returns an internal server error
[Tue Nov 29 12:08:08 2005] [error] [client 10.50.8.55] Premature end of script headers: bb-hist.sh
Most likely there is some sort of malformed entry in that history file which the Hobbit histlog CGI cannot handle. If you could send me that file - should be ~hobbit/data/hist/cronos.AHD - I can look into it.
First of all, let me thank you for your help.
I've sent the cronos.AHD file to you by email.
- I had a coredump with bbgen : #3 0x080639d1 in sigsegv_handler (signum=0) at sig.c:57 #4 <signal handler called> #5 main (argc=11, argv=0xbfffd7a4) at bbgen.c:586
Weird. Does it happen every time you run bbgen ? I hope not.
Once only. I had some "errors" in bb-hosts (duplicate subpage names, which seemed to confuse bb-findhost.cgi). Since then I didn't have any new coredumps.
- some of my devices (running the Quest BB client 3.01) do not update their statuses as frequently as they should Even if I add a process to check, it doesn't appear in Hobbit. Could it be a compatibility problem (my tests with a Quest BB client 3.01 running on XP SP2 ran fine).
Hard to tell ... my best suggestion is to capture a network trace of the traffic between one of these clients and the Hobbit server. If a Linux box, you can use tcpdump -s 1500 -w capturefile tcp port 1984 and host CLIENTHOSTIP to grab only the traffic between Hobbit and this client.
I'm looking into it, with tcpdump and tcpflow.
- one of my custom network test keeps getting "Unexpected service response" Its definition is this : [ica] expect "ICA" port 1494
Most likely, it isn't getting any response back. What does "telnet HOSTNAME 1494" give you ?
It works as expected :
$ telnet xx.xx.xx.xx 1494 Trying xx.xx.xx.xx... Connected to xx.xx.xx.xx Escape character is '^]'. ICA
- I get a lot of errors in page.log : 2005-11-29 12:38:03 Bad timespec (missing colon or wrong weekdays): 1-6:0000:0700
Do I have to use "TIME=123456:0000:0700" ?
I think so, yes.
Thanks, it's working now.
I have some more questions :
- when stopping Hobbit, is it normal to have this kind of errors in history.log :
2005-11-29 10:51:28 Tried to down BOARDBUSY: Invalid argument 2005-11-29 10:51:28 Worker process died with exit code 0, terminating 2005-11-29 10:51:28 Could not get shm of size 262144: No such file or directory 2005-11-29 10:51:28 Channel not available
- I'm running a few scripts with a long heartbeat (eg. "status+1440"), are these errors in history.log normal ?
2005-11-29 11:00:04 Will not update /BB/hobbit/data/hist/HQSTERIACRAUO.NetBackup - color unchanged (green) 2005-11-29 12:00:14 Will not update /BB/hobbit/data/hist/cronos.ebuilds
color unchanged (green) 2005-11-29 12:00:19 Will not update /BB/hobbit/data/hist/hades.ebuilds - color unchanged (green) 2005-11-29 12:01:08 Will not update /BB/hobbit/data/hist/ve1stnet3.photos - color unchanged (green) 2005-11-29 13:33:33 Will not update /BB/hobbit/data/hist/prod.file - color unchanged (green)
what do this errors in page.log mean ?
2005-11-29 14:06:31 hobbitd_alert: Got message 4723, expected 4722 2005-11-29 14:00:15 Worker process died with exit code 0, terminating 2005-11-29 13:42:36 Stale alert for antivirus-1:http dropped
- and these in hobbitlaunch.log ? Should I run hobbitd_alert in debug mode ?
2005-11-29 12:48:32 Task bbpage terminated, status 1 2005-11-29 13:21:41 Task bbpage terminated, status 1 2005-11-29 13:32:07 Task bbpage terminated, status 1 2005-11-29 13:32:32 Task bbpage terminated, status 1 2005-11-29 13:37:44 Task bbpage terminated, status 1 2005-11-29 13:59:52 Task bbpage terminated, status 1 2005-11-29 14:00:15 Task bbpage terminated, status 1
Thanks again !
--
Frédéric Mangeant
Steria EDC Sophia-Antipolis
Henrik Stoerner a écrit :
On Tue, Nov 29, 2005 at 12:49:44PM +0100, Frédéric Mangeant wrote:
- some of my devices (running the Quest BB client 3.01) do not update their statuses as frequently as they should Even if I add a process to check, it doesn't appear in Hobbit. Could it be a compatibility problem (my tests with a Quest BB client 3.01 running on XP SP2 ran fine).
Hard to tell ... my best suggestion is to capture a network trace of the traffic between one of these clients and the Hobbit server. If a Linux box, you can use tcpdump -s 1500 -w capturefile tcp port 1984 and host CLIENTHOSTIP to grab only the traffic between Hobbit and this client.
Well, I have finally found the reason...
The Quest BB Client 3.01 provides a useful "Start Monitoring After" feature. When set to something different of 0, every change made with the bbntcfg.exe control panel isn't taken into account. If I don't use the "Start Monitoring After" feature, everything is working fine.
Let's open a trouble ticket at Quest Software...
--
Frédéric Mangeant
Steria EDC Sophia-Antipolis
Henrik Stoerner a écrit :
On Tue, Nov 29, 2005 at 12:49:44PM +0100, Frédéric Mangeant wrote:
- I had a coredump with bbgen : #3 0x080639d1 in sigsegv_handler (signum=0) at sig.c:57 #4 <signal handler called> #5 main (argc=11, argv=0xbfffd7a4) at bbgen.c:586
Weird. Does it happen every time you run bbgen ? I hope not.
It occured twice today :
$ file ~/server/tmp/core core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, SVR4-style, from 'bbgen'
$ gdb ~/server/bin/bbgen core GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `bbgen --recentgifs --subpagecolumns=2 --report --nopropred=AHD --subpagecolumns'. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libpcre.so.0...done. Loaded symbols for /usr/lib/libpcre.so.0 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x40075941 in kill () from /lib/libc.so.6 (gdb) bt #0 0x40075941 in kill () from /lib/libc.so.6 #1 0x400756e5 in raise () from /lib/libc.so.6 #2 0x40076a86 in abort () from /lib/libc.so.6 #3 0x080639d1 in sigsegv_handler (signum=0) at sig.c:57 #4 <signal handler called> #5 main (argc=11, argv=0xbfffe3b4) at bbgen.c:586 (gdb)
I'm running bbgen twice :
- once every 10 seconds for my "main" Hobbit map
- once every minute for my "customers" Hobbit map.
Can it be a problem ?
--
Frédéric Mangeant
Steria EDC Sophia-Antipolis
participants (2)
-
frederic.mangeant@steria.com
-
henrik@hswn.dk