I tried but I still don't get an actual list of hosts just the option for "ALL". The trends column has remained green though. It may have been from starting and stopping a few times when I was testing at the beginning that the history logs were created.
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Friday, April 01, 2005 1:34 AM To: hobbit at hswn.dk Subject: Re: [hobbit] trends columns all turned purple
On Thu, Mar 31, 2005 at 06:04:14PM -0500, Deal, Richard wrote:
[trends column goes purple]
Has anyone else notices maint.pl lists now hosts, only "ALL"?
Neither of these show up on any of the hosts I've tested. sounds like there's some problem with the hobbit daemon on your box, could you try stopping Hobbit, then do a "ps" to make sure all of the tasks have stopped, then restarting it ?
Regards, Henrik
I re-installed RC6 to get rid of the purple trends. The maint.pl lists and 'ALL' may still be a problem, it does occur in both Firefox and IE. It can be alleviated by changing views. Typically we monitor bb2.html, but if we switch to the Main view and try enable/disable, it typically works, this only occurred since RC6, I think.
Richard, you are running Solaris 9 also?
~David
Deal, Richard wrote:
I tried but I still don't get an actual list of hosts just the option for "ALL". The trends column has remained green though. It may have been from starting and stopping a few times when I was testing at the beginning that the history logs were created.
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Friday, April 01, 2005 1:34 AM To: hobbit at hswn.dk Subject: Re: [hobbit] trends columns all turned purple
On Thu, Mar 31, 2005 at 06:04:14PM -0500, Deal, Richard wrote:
[trends column goes purple]
Has anyone else notices maint.pl lists now hosts, only "ALL"?
Neither of these show up on any of the hosts I've tested. sounds like there's some problem with the hobbit daemon on your box, could you try stopping Hobbit, then do a "ps" to make sure all of the tasks have stopped, then restarting it ?
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
OK, I think there are two completely un-related issues here.
In <424D7FE3.1040600 at mci.com> David Gore <David.Gore at mci.com> writes:
I re-installed RC6 to get rid of the purple trends.
The "trends" don't go purple by themselves, they go purple because the bb-larrdcolumn tool doesn't update them.
I'd like you to check for unusual messages in the /var/log/hobbit/bb-display.log file, and also for any core-files left behind from bb-larrdcolumn. They should be in the ~/server/tmp/ directory, but please check in the ~/data/rrd/ directory also.
The maint.pl lists and 'ALL' may still be a problem, it does occur in both Firefox and IE. It can be alleviated by changing views. Typically we monitor bb2.html, but if we switch to the Main view and try enable/disable, it typically works, this only occurred since RC6, I think.
The missing hosts on the maint.pl display sounds like you're still running the version of maint.pl that uses cookies to try and display only the hosts from the page you were on. This was removed in the 4.0 release precisely because it was causing problems. Could you check your maint.pl script and see if lines 432-433 look like this:
432 # open (HOBBITDLIST, "bb ".$BBENV{'BBDISP'}." \"hobbitdboard ".$filter."\" |");
433 open (HOBBITDLIST, "bb ".$BBENV{'BBDISP'}." hobbitdboard |");
Thanks, Henrik
Yes one core file:
hobbit at hobbit ~/server> find . -name core ./tmp/core hobbit at hobbit ~/server> file tmp/core tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd' hobbit at hobbit ~/server> ls -al tmp/core -rw------- 1 hobbit other 8322084 Apr 1 03:27 tmp/core
I checked to make sure the lines were correct in maint.pl. I am reinstalling 4.0.1, again and will let you know how it goes.
~David
Henrik Storner wrote:
OK, I think there are two completely un-related issues here.
In <424D7FE3.1040600 at mci.com> David Gore <David.Gore at mci.com> writes:
I re-installed RC6 to get rid of the purple trends.
The "trends" don't go purple by themselves, they go purple because the bb-larrdcolumn tool doesn't update them.
I'd like you to check for unusual messages in the /var/log/hobbit/bb-display.log file, and also for any core-files left behind from bb-larrdcolumn. They should be in the ~/server/tmp/ directory, but please check in the ~/data/rrd/ directory also.
The maint.pl lists and 'ALL' may still be a problem, it does occur in both Firefox and IE. It can be alleviated by changing views. Typically we monitor bb2.html, but if we switch to the Main view and try enable/disable, it typically works, this only occurred since RC6, I think.
The missing hosts on the maint.pl display sounds like you're still running the version of maint.pl that uses cookies to try and display only the hosts from the page you were on. This was removed in the 4.0 release precisely because it was causing problems. Could you check your maint.pl script and see if lines 432-433 look like this:
432 # open (HOBBITDLIST, "bb ".$BBENV{'BBDISP'}." \"hobbitdboard ".$filter."\" |"); 433 open (HOBBITDLIST, "bb ".$BBENV{'BBDISP'}." hobbitdboard |");Thanks, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
David Gore wrote:
Yes one core file:
hobbit at hobbit ~/server> find . -name core ./tmp/core hobbit at hobbit ~/server> file tmp/core tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd' hobbit at hobbit ~/server> ls -al tmp/core -rw------- 1 hobbit other 8322084 Apr 1 03:27 tmp/core
I checked to make sure the lines were correct in maint.pl. I am reinstalling 4.0.1, again and will let you know how it goes.
~David
Henrik Storner wrote:
OK, I think there are two completely un-related issues here.
In <424D7FE3.1040600 at mci.com> David Gore <David.Gore at mci.com> writes:
I re-installed RC6 to get rid of the purple trends.
The "trends" don't go purple by themselves, they go purple because the bb-larrdcolumn tool doesn't update them.
I'd like you to check for unusual messages in the /var/log/hobbit/bb-display.log file, and also for any core-files left behind from bb-larrdcolumn. They should be in the ~/server/tmp/ directory, but please check in the ~/data/rrd/ directory also.
The maint.pl lists and 'ALL' may still be a problem, it does occur in both Firefox and IE. It can be alleviated by changing views. Typically we monitor bb2.html, but if we switch to the Main view and try enable/disable, it typically works, this only occurred since RC6, I think.
The missing hosts on the maint.pl display sounds like you're still running the version of maint.pl that uses cookies to try and display only the hosts from the page you were on. This was removed in the 4.0 release precisely because it was causing problems. Could you check your maint.pl script and see if lines 432-433 look like this:
432 # open (HOBBITDLIST, "bb ".$BBENV{'BBDISP'}." \"hobbitdboard".$filter."\" |"); 433 open (HOBBITDLIST, "bb ".$BBENV{'BBDISP'}." hobbitdboard |");
Thanks, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
New core and the purples are back:
find -namehobbit at hobbit ~/server> find -name core -ls 141454 8024 -rw------- 1 hobbit other 8207396 Apr 1 22:08 ./tmp/core hobbit at hobbit ~/server> file ./tmp/core ./tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd'
David Gore wrote:
David Gore wrote:
Yes one core file:
hobbit at hobbit ~/server> find . -name core ./tmp/core hobbit at hobbit ~/server> file tmp/core tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd' hobbit at hobbit ~/server> ls -al tmp/core -rw------- 1 hobbit other 8322084 Apr 1 03:27 tmp/core
I checked to make sure the lines were correct in maint.pl. I am reinstalling 4.0.1, again and will let you know how it goes.
~David
Henrik Storner wrote:
OK, I think there are two completely un-related issues here.
In <424D7FE3.1040600 at mci.com> David Gore <David.Gore at mci.com> writes:
I re-installed RC6 to get rid of the purple trends.
The "trends" don't go purple by themselves, they go purple because the bb-larrdcolumn tool doesn't update them.
I'd like you to check for unusual messages in the /var/log/hobbit/bb-display.log file, and also for any core-files left behind from bb-larrdcolumn. They should be in the ~/server/tmp/ directory, but please check in the ~/data/rrd/ directory also.
The maint.pl lists and 'ALL' may still be a problem, it does occur in both Firefox and IE. It can be alleviated by changing views. Typically we monitor bb2.html, but if we switch to the Main view and try enable/disable, it typically works, this only occurred since RC6, I think.
The missing hosts on the maint.pl display sounds like you're still running the version of maint.pl that uses cookies to try and display only the hosts from the page you were on. This was removed in the 4.0 release precisely because it was causing problems. Could you check your maint.pl script and see if lines 432-433 look like this:
432 # open (HOBBITDLIST, "bb ".$BBENV{'BBDISP'}."\"hobbitdboard ".$filter."\" |"); 433 open (HOBBITDLIST, "bb ".$BBENV{'BBDISP'}." hobbitdboard |");
Thanks, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
New core and the purples are back:
find -namehobbit at hobbit ~/server> find -name core -ls 141454 8024 -rw------- 1 hobbit other 8207396 Apr 1 22:08 ./tmp/core hobbit at hobbit ~/server> file ./tmp/core ./tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd'
More cores...
hobbit at hobbit ~> find -name core -ls 141462 9000 -rw------- 1 hobbit other 9206820 Apr 1 22:37 ./server/tmp/core 843033 1784 -rw------- 1 hobbit other 1817864 Mar 2 08:20 ./data/acks/core 910789 464 -rw------- 1 hobbit other 465764 Apr 1 21:52 ./data/logs/core hobbit at hobbit ~> file ./data/logs/core ./data/logs/core: ELF 32-bit MSB core file SPARC Version 1, from 'bb-larrdcolumn' hobbit at hobbit ~> file ./server/tmp/core ./server/tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd'
On Fri, Apr 01, 2005 at 09:51:45PM +0000, David Gore wrote:
Yes one core file:
hobbit at hobbit ~/server> find . -name core ./tmp/core hobbit at hobbit ~/server> file tmp/core tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd' hobbit at hobbit ~/server> ls -al tmp/core -rw------- 1 hobbit other 8322084 Apr 1 03:27 tmp/core
I checked to make sure the lines were correct in maint.pl. I am reinstalling 4.0.1, again and will let you know how it goes.
OK, so it does dump core.
To get some more info about this, you need the core file *and* the hobbitd binary that generated it. Then run
$ gdb bin/hobbitd tmp/core [messages from gdb] gdb> bt
to load the core file and the hobbitd binary into gdb (the GNU debugger), and the the "bt" command will provide a call trace of what happened when the program crashed, that is the first piece of information that is needed to find the bug.
Regards, Henrik
Here is the first core trace after the re-install and start 4.0.1:
hobbit at hobbit ~/server> gdb bin/hobbitd tmp/core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.9"... Core was generated by `hobbitd --restart=/export/home/hobbit/server/tmp/hobbitd.chk --checkpoint-file='. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,Ultra-4/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,Ultra-4/lib/libc_psr.so.1 #0 0xff19d3d4 in _libc_kill () from /usr/lib/libc.so.1 (gdb) bt #0 0xff19d3d4 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff135698 in abort () from /usr/lib/libc.so.1 #2 0x0001db68 in sigsegv_handler (signum=10) at sig.c:57 #3 <signal handler called>
David Gore (v965-3670) Enhanced Technology Support (ETS) Network Management Systems (NMS) IMPACT Transport Team Lead - SCSA, SCNA Page: 1-800-PAG-eMCI pin 1406090 Vnet: 965-3676
Henrik Stoerner wrote:
On Fri, Apr 01, 2005 at 09:51:45PM +0000, David Gore wrote:
Yes one core file:
hobbit at hobbit ~/server> find . -name core ./tmp/core hobbit at hobbit ~/server> file tmp/core tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd' hobbit at hobbit ~/server> ls -al tmp/core -rw------- 1 hobbit other 8322084 Apr 1 03:27 tmp/core
I checked to make sure the lines were correct in maint.pl. I am reinstalling 4.0.1, again and will let you know how it goes.
OK, so it does dump core.
To get some more info about this, you need the core file *and* the hobbitd binary that generated it. Then run
$ gdb bin/hobbitd tmp/core [messages from gdb] gdb> bt
to load the core file and the hobbitd binary into gdb (the GNU debugger), and the the "bt" command will provide a call trace of what happened when the program crashed, that is the first piece of information that is needed to find the bug.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
David Gore wrote:
Here is the first core trace after the re-install and start 4.0.1:
hobbit at hobbit ~/server> gdb bin/hobbitd tmp/core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.9"... Core was generated by `hobbitd --restart=/export/home/hobbit/server/tmp/hobbitd.chk --checkpoint-file='. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,Ultra-4/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,Ultra-4/lib/libc_psr.so.1 #0 0xff19d3d4 in _libc_kill () from /usr/lib/libc.so.1 (gdb) bt #0 0xff19d3d4 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff135698 in abort () from /usr/lib/libc.so.1 #2 0x0001db68 in sigsegv_handler (signum=10) at sig.c:57 #3 <signal handler called>
Henrik Stoerner wrote:
On Fri, Apr 01, 2005 at 09:51:45PM +0000, David Gore wrote:
Yes one core file:
hobbit at hobbit ~/server> find . -name core ./tmp/core hobbit at hobbit ~/server> file tmp/core tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd' hobbit at hobbit ~/server> ls -al tmp/core -rw------- 1 hobbit other 8322084 Apr 1 03:27 tmp/core
I checked to make sure the lines were correct in maint.pl. I am reinstalling 4.0.1, again and will let you know how it goes.
OK, so it does dump core.
To get some more info about this, you need the core file *and* the hobbitd binary that generated it. Then run
$ gdb bin/hobbitd tmp/core [messages from gdb] gdb> bt
to load the core file and the hobbitd binary into gdb (the GNU debugger), and the the "bt" command will provide a call trace of what happened when the program crashed, that is the first piece of information that is needed to find the bug.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Based on the core trace, I decided to remove all files from ~/server/tmp/. I do not think we had removed any files from running RC1,2,4-6 from '~/server/tmp/' so perhaps it was choking on something (hobbitd.chk)? It has been running clean for more than an hour. It typically dumps core before an hour has passed and the purple trends occur before an hour of running too. I did have to re-disable my disabled hosts of course. So far it looks good! Thanks for your help Henrik.
~David
David Gore wrote:
David Gore wrote:
Here is the first core trace after the re-install and start 4.0.1:
hobbit at hobbit ~/server> gdb bin/hobbitd tmp/core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.9"... Core was generated by `hobbitd --restart=/export/home/hobbit/server/tmp/hobbitd.chk --checkpoint-file='. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,Ultra-4/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,Ultra-4/lib/libc_psr.so.1 #0 0xff19d3d4 in _libc_kill () from /usr/lib/libc.so.1 (gdb) bt #0 0xff19d3d4 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff135698 in abort () from /usr/lib/libc.so.1 #2 0x0001db68 in sigsegv_handler (signum=10) at sig.c:57 #3 <signal handler called>
Henrik Stoerner wrote:
On Fri, Apr 01, 2005 at 09:51:45PM +0000, David Gore wrote:
Yes one core file:
hobbit at hobbit ~/server> find . -name core ./tmp/core hobbit at hobbit ~/server> file tmp/core tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd' hobbit at hobbit ~/server> ls -al tmp/core -rw------- 1 hobbit other 8322084 Apr 1 03:27 tmp/core
I checked to make sure the lines were correct in maint.pl. I am reinstalling 4.0.1, again and will let you know how it goes.
OK, so it does dump core.
To get some more info about this, you need the core file *and* the hobbitd binary that generated it. Then run
$ gdb bin/hobbitd tmp/core [messages from gdb] gdb> bt
to load the core file and the hobbitd binary into gdb (the GNU debugger), and the the "bt" command will provide a call trace of what happened when the program crashed, that is the first piece of information that is needed to find the bug.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Based on the core trace, I decided to remove all files from ~/server/tmp/. I do not think we had removed any files from running RC1,2,4-6 from '~/server/tmp/' so perhaps it was choking on something (hobbitd.chk)? It has been running clean for more than an hour. It typically dumps core before an hour has passed and the purple trends occur before an hour of running too. I did have to re-disable my disabled hosts of course. So far it looks good! Thanks for your help Henrik.
~David
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Well, it's back after a restart. I will upload the cores, please delete them when you are finished. I probably will try removing the check file, and restarting, but that is not a real solution, if it works.
~David
participants (3)
-
David.Gore@mci.com
-
henrik@hswn.dk
-
rdeal@tigr.org