The server is Solaris 10.
Yesterday I shut down hobbit, moved to another disk, created a soft link, started up again. Now this shows up in hobbitfetch.log:
2008-12-31 08:49:52 Caught TERM signal, terminating 2008-12-31 08:53:49 Connection lost during connect/write to 10.1.1.184:1984 (req 132): Broken pipe 2008-12-31 08:53:56 Connection lost during connect/write to 10.1.1.183:1984 (req 246): Broken pipe 2008-12-31 08:53:56 Out of sockets (req 280) 2008-12-31 08:53:56 Out of sockets (req 281) 2008-12-31 08:53:56 Out of sockets (req 282) 2008-12-31 08:53:56 Out of sockets (req 283) 2008-12-31 08:53:56 Out of sockets (req 284) 2008-12-31 08:53:56 Out of sockets (req 285)
All my hobbitfetch clients are purple.
How do I identify what sockets are required? It sounds like an OS resource issue.
Thanks
Craig
On Wed, Dec 31, 2008 at 9:45 AM, Craig Cook <Craig.Cook at gpi.com> wrote:
The server is Solaris 10.
Yesterday I shut down hobbit, moved to another disk, created a soft link, started up again. Now this shows up in hobbitfetch.log:
2008-12-31 08:49:52 Caught TERM signal, terminating
2008-12-31 08:53:49 Connection lost during connect/write to 10.1.1.184:1984 (req 132): Broken pipe
2008-12-31 08:53:56 Connection lost during connect/write to 10.1.1.183:1984 (req 246): Broken pipe
2008-12-31 08:53:56 Out of sockets (req 280)
2008-12-31 08:53:56 Out of sockets (req 281)
2008-12-31 08:53:56 Out of sockets (req 282)
2008-12-31 08:53:56 Out of sockets (req 283)
2008-12-31 08:53:56 Out of sockets (req 284)
2008-12-31 08:53:56 Out of sockets (req 285)
All my hobbitfetch clients are purple.
How do I identify what sockets are required? It sounds like an OS resource issue.
Have you tried running execsnoop' and opensnoop' (from
dtracetoolkit) ? It should tell you who is trying to connect or open
new socket, if I am not mistaken.
Thanks
Craig
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
Something on the network blew up, router or switch. Of course hobbit went purple. It was a Thursday afternoon and it was in the Network Gods hands at this point. They fixed it that night I had still had the purple condition in the morning. I checked both the server and clients (~100) and everything was running. I restarted both anyway and waited. Went home for the weekend. Came in Monday and still had the purple plague.
Did some reading, found this scrip in one of the hobbit mailing list and ran it.
/usr/lib/hobbit/server
# !/usr/bin/ksh
HBBB="/usr/lib/hobbit/server/bin/bb --debug"
${HBBB} 127.0.0.1 "hobbitdboard color=purple fields=hostname,testname"
|
while read L; do
HOST=`echo $L | cut -d'|' -f1`
TEST=`echo $L | cut -d'|' -f2`
${HBBB} 127.0.0.1 "drop $HOST $TEST"
done
bash-3.00$
Purple condition is gone. But so are many of my monitoring tasks such as CPU, processes, file system status, and more. None of them are working sans "conn', 'info', and 'trends.' Even after several days.
What do I need to do to hobbit and/or the client to have these monitored again?
THANK YOU! /tg
On Wednesday 31 December 2008, Tim Grzechowski wrote:
What do I need to do to hobbit and/or the client to have these monitored again? What you did with your script is deleting all checks that are purple. A purple check is a check that has not been updated for a while. So removing a purple check can be done in 2 ways: like you did by deleting the check or by sending a new status. In your case, you need to check why the status is not been send to the hobbit server. Check the client logs, you can also try a telnet on the hobbit port (default 1984) from the client to the hobbit server to see if there is a network problem. Also, check the ghost client list (can be found on the hobbit server in the menu).
Happy new year and wishes you all a good monitoring time,
Stef
Stef,
For some reason your email makes Outlook burp (go figure). Comes up blank and I get a Windows error that says:
From: Stef Coene [mailto:stef.coene at docum.org] Sent: Thursday, January 01, 2009 2:50 AM To: hobbit at hswn.dk Subject: Re: [hobbit] Purple Problems
Stef,
For some reason your email comes up blank and I get a Windows alert that says:
Can you, or somebody else, please repost.
Thanks!
/tg
From: Stef Coene [mailto:stef.coene at docum.org] Sent: Thursday, January 01, 2009 2:50 AM To: hobbit at hswn.dk Subject: Re: [hobbit] Purple Problems
The Ghost Client list is blank and blue (Disabled?).
Checked eight clients out of the ~100 and all of them are able to telnet to the server on port 1984 without issue.
Shutdown the client on a client only machine. Copy /dev/null to hobbitclient.log and clientlaunch.log . Started 'runclient.sh start' -- hobbitclient.log is empty. clientlaunch.log has two lines that show it has started.
No change.
I shut down the client (on the server) and hobbit server itself. Checked / cleared the logs. Restarted both the server, and after a couple minutes restarted client on the server as well.
No change. Still not showing at of the pertinent info.
On the server checked client-local.cfg and bb-hosts in hobbit's /etc directory and they are fine and the last day of access was weeks before this issue popped up.
Any other ideas?
/tg
P.S. All the file systems have ample available space.
From: Stef Coene [mailto:stef.coene at docum.org] Sent: Thursday, January 02, 2009 17:44:28 +0100
To: hobbit at hswn.dk Subject: Re: [hobbit] Purple Problems
What you did with your script is deleting all checks that are purple. A purple check is a check that has not been updated for a while. So removing a purple check can be done in 2 ways: like you did by deleting the check or by sending a new status. In your case, you need to check why the status is not been send to the hobbit server. Check the client logs, you can also try a telnet on the hobbit port (default 1984) from the client to the hobbit server to see if there is a network problem. Also, check the ghost client list (can be found on the hobbit server in the menu).
Happy new year and wishes you all a good monitoring time,
Stef
On Friday 02 January 2009, Tim Grzechowski wrote:
Any other ideas? Make sure BBDISP is correct in hobbitclient.cfg.
Check the files in the tmp directory in the hobbit client directory. There should be some files created by the client: logfetch.client_name.cfg msg.client_name.txt hobbit_vmstat.client_name.25687 logfetch.client_name.status
Stef
Trying running the main client script by hand to make sure none of the commands hang (from the client ~/client/bin directory):
./bbcmd sh -x hobbitclient-<your OS>.sh # replace <your OS> with the proper OS
If ~/client/tmp/msg.<hostname>.txt is being created then there is probably no need to execute the line above. You checked for ghosts as others sugested right?
~David
From: Tim Grzechowski [mailto:tim.grzechowski.osv at fedex.com]
Sent: Friday, January 02, 2009 21:16
To: hobbit at hswn.dk
Subject: RE: [hobbit] Purple Problems
The Ghost Client list is blank and blue (Disabled?).
Checked eight clients out of the ~100 and all of them are able
to telnet to the server on port 1984 without issue.
Shutdown the client on a client only machine. Copy /dev/null to
hobbitclient.log and clientlaunch.log . Started 'runclient.sh start' -- hobbitclient.log is empty. clientlaunch.log has two lines that show it has started.
No change.
I shut down the client (on the server) and hobbit server itself.
Checked / cleared the logs. Restarted both the server, and after a couple minutes restarted client on the server as well.
No change. Still not showing at of the pertinent info.
On the server checked client-local.cfg and bb-hosts in hobbit's
/etc directory and they are fine and the last day of access was weeks before this issue popped up.
Any other ideas?
/tg
P.S. All the file systems have ample available space.
From: Stef Coene [mailto:stef.coene at docum.org]
Sent: Thursday, January 02, 2009 17:44:28 +0100
To: hobbit at hswn.dk
Subject: Re: [hobbit] Purple Problems
What you did with your script is deleting all checks that are
purple. A purple check is a check that has not been updated for a while. So removing a purple check can be done in 2 ways: like you did by deleting the check or by sending a new status. In your case, you need to check why the status is not been send to the hobbit server. Check the client logs, you can also try a telnet on the hobbit port (default 1984) from the client to the hobbit server to see if there is a network problem. Also, check the ghost client list (can be found on the hobbit server in the menu).
Happy new year and wishes you all a good monitoring time,
Stef
In <0KCR0014H5S7P5 at emmsvr01.prod.fedex.com> Tim Grzechowski <tim.grzechowski.osv at fedex.com> writes:
Something on the network blew up, router or switch. Of course hobbit went purple. It was a Thursday afternoon and it was in the Network Gods hands at this point. They fixed it that night I had still had the purple condition in the morning. I checked both the server and clients (~100) and everything was running. I restarted both anyway and waited. Went home for the weekend. Came in Monday and still had the purple plague.
I assume your Hobbit webpages are being updated ? (Check the timestamp in the upper-right corner).
Is hobbitd_client running on the server ?
On one of the clients, login as the user running the HObbit client and run the "bbcmd" tool. You'll get a new shell prompt. Do an echo $BB echo $BBDISP and check that these point to the "bb" utility on the client, and the IP-address of your Hobbit server.
Then run $BB $BBDISP "ping" You should get a response back from the Hobbit server.o
Next, run $BB $BBDISP "status $MACHINE.purpletest green Checking status"
This sends a "purpletest" status-message to the Hobbit server. If everything works OK, then you should get a "purpletest" status column (in color green) for this client host) the next time the Hobbit webpages are updated.
Let us know what you find out.
Regards, Henrik
Thanks Henrik! Unbeknownst to me there was some sort of turf war of sorts... somebody changed the BBDISP via an "include" line in the hobbitclient.cfg file that changed where it was pointing to.
We added ours server to the BBDISPLAYS and we are happily co-existing now.
Thanks again! /tg
-----Original Message----- From: Henrik Størner [mailto:henrik at hswn.dk] Sent: Monday, January 05, 2009 6:49 AM To: hobbit at hswn.dk Subject: Re: [hobbit] Purple Problems
In <0KCR0014H5S7P5 at emmsvr01.prod.fedex.com> Tim Grzechowski <tim.grzechowski.osv at fedex.com> writes:
Something on the network blew up, router or switch. Of course hobbit went purple. It was a Thursday afternoon and it was in the Network Gods hands at this point. They fixed it that night I had still had the purple condition in the morning. I checked both the server and clients (~100) and everything was running. I restarted both anyway and waited. Went home for the weekend. Came in Monday and still had the purple plague.
I assume your Hobbit webpages are being updated ? (Check the timestamp in the upper-right corner).
Is hobbitd_client running on the server ?
On one of the clients, login as the user running the HObbit client and run the "bbcmd" tool. You'll get a new shell prompt. Do an echo $BB echo $BBDISP and check that these point to the "bb" utility on the client, and the IP-address of your Hobbit server.
Then run $BB $BBDISP "ping" You should get a response back from the Hobbit server.o
Next, run $BB $BBDISP "status $MACHINE.purpletest green Checking status"
This sends a "purpletest" status-message to the Hobbit server. If everything works OK, then you should get a "purpletest" status column (in color green) for this client host) the next time the Hobbit webpages are updated.
Let us know what you find out.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
participants (6)
-
Craig.Cook@gpi.com
-
david.gore@verizonbusiness.com
-
henrik@hswn.dk
-
stef.coene@docum.org
-
tim.grzechowski.osv@fedex.com
-
vadud3@gmail.com