Last email for a while, i promise; i'm chainsmoking packets at this point. but i found this-
2005-11-01 14:14:20 TCP tests completed normally 2005-11-01 14:14:20 Execution of 'fping -Ae' failed with error-code 99 2005-11-01 14:14:20 Sending results for service conn
Okay, it can't find fping. But...
hobbit at randomaccess ~/server/bin $ more ../etc/hobbitserver.cfg |grep fping
Make sure the path includes the directories where you have fping, mail
and (optionally) ntpdate installed, FPING="/usr/sbin/fping" # Path and options for the 'fping' program. hobbit at randomaccess ~/server/bin $ /usr/sbin/fping -Ae brassai 10.10.10.15 is alive (0.15 ms) hobbit at randomaccess ~/server/bin $
So it should be finding fping just fine, and fping is working. The path is in hobbitserver.cfg:
Make sure the path includes the directories where you have fping, mail
and (optionally) ntpdate installed,
as well as the BBHOME/bin directory where all of the Hobbit programs
reside. PATH="/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/home/hobbit/server/bin" ...
For bbtest-net
... FPING="/usr/sbin/fping"
Path and options for the 'fping' program.
and
[bbnet] ENVFILE /home/hobbit/server/etc/hobbitserver.cfg
So, by all the above: fping is functional, it is accessible by the 'hobbit' user, it can reach the clients, it is in the PATH, it is defined in the ENVFILE bbnet is using.
So what's gone wrong??
Rob Munsch wrote:
Since ssh, ldap, and dns are tests run from the serverside (cpu etc remaining green indicates the clients are running and communicating OK, right?), i ran
./bbtest-net --concurrency=50 --checkresponse --no-update --timing --debug
Now, i can ping and ssh to all clients from server just fine. But i see this:
2005-11-01 14:14:20 Adding to combo msg: status brassai.conn red <!-- [flags:ordAstILe] --> Tue Nov 1 14:14:20 2005 conn NOT ok status brassai.conn red <!-- [flags:ordAstILe] --> Tue Nov 1 14:14:20 2005 conn NOT ok
Service conn on brassai is not OK : Host does not respond to ping
System unreachable for 3 poll periods (56 seconds)
Aha. Since the ping test fails, why test other net services? So now it makes sense; the net tests are not being run, hence the purple.
a'course, i don't know why the nettest is suddenly unable to ping anything. It is getting the right IPs internally:
2005-11-01 14:14:20 Got DNS result for host doisneau : 10.x.x.x 2005-11-01 14:14:20 Got DNS result for host brassai : 10.x.x.x 2005-11-01 14:14:20 Got DNS result for host moadib : 10.x.x.x
and i thought cranking the concurrency way down might help, but apparently it doesn't.
So, i'm glad i found the cause... now i just need to find out the cause's cause. o_O
-- Rob Munsch Systems Analyst, Solutions for Progress http://www.solutionsforprogress.com