Since ssh, ldap, and dns are tests run from the serverside (cpu etc remaining green indicates the clients are running and communicating OK, right?), i ran
./bbtest-net --concurrency=50 --checkresponse --no-update --timing --debug
Now, i can ping and ssh to all clients from server just fine. But i see this:
2005-11-01 14:14:20 Adding to combo msg: status brassai.conn red <!-- [flags:ordAstILe] --> Tue Nov 1 14:14:20 2005 conn NOT ok status brassai.conn red <!-- [flags:ordAstILe] --> Tue Nov 1 14:14:20 2005 conn NOT ok
Service conn on brassai is not OK : Host does not respond to ping
System unreachable for 3 poll periods (56 seconds)
Aha. Since the ping test fails, why test other net services? So now it makes sense; the net tests are not being run, hence the purple.
a'course, i don't know why the nettest is suddenly unable to ping anything. It is getting the right IPs internally:
2005-11-01 14:14:20 Got DNS result for host doisneau : 10.x.x.x 2005-11-01 14:14:20 Got DNS result for host brassai : 10.x.x.x 2005-11-01 14:14:20 Got DNS result for host moadib : 10.x.x.x
and i thought cranking the concurrency way down might help, but apparently it doesn't.
So, i'm glad i found the cause... now i just need to find out the cause's cause. o_O
Rob Munsch wrote:
There's no entries in the network log since 10/28. Hobbit is running on the server, and the clients are running on the various clients.
CPU, Memory, Disk and Procs all remain green! SSH, ldaps, and dns on the clients are purple.
On the hobbit server itself, bbd is purple. Everything else is green. Network connectivity between all clients > server is functional.
I don't get it...
Henrik Stoerner wrote:
On Mon, Oct 31, 2005 at 05:32:44PM -0500, Rob Munsch wrote:
Consider the below. Approx. 25 minutes ago, across all monitored systems, all net monitored services - ssh, ldaps and dns - went to purple. They are still up, running, and just fine in every respect. The status message is even the same as when it was showing green. But now every ssh, ldaps and dns light is purple.
Purple is an indication that some part of your monitoring system has stopped.
All of the purple ones are network services ? Then it sounds as if your network tests have stopped running. Check the ~hobbit/server/logs/bb-network.log file for any errors.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Rob Munsch Systems Analyst, Solutions for Progress http://www.solutionsforprogress.com