turning up (way up) debugging for the conn tests
Hi,
I've been trying to debug a seemingly false conn failure between one of the interfaces of my Netapp filer and my Hobbit server. This problem has persisted for a very long time, and I'm just not able to narrow down or capture enough information to find out what is causing it.
I'm using Hobbit 4.1.2p1 on my server, and Data ONTAP 7.0.4 on my FAS3020c filers.
My hobbit server is configured to monitor 2 IPs for 2 different interfaces on the netapp filer. At seemingly random intervals, the conn tests will fail for one of those interfaces, even though NFS continues to work just fine out of that interface. After the interface "fails", it'll either flap between green and red, or just stay red for 30 mins, an hour, sometimes 4 hours. THere does not appears to be a regular pattern or time period for this behavior.
Since tcpdump became my new buddy, I've noticed a few behaviors between the hobbit server and the "failing" filer interface that I'd like to better understand.
How may pings are sent by fping to each host, by default? If I understand what the man page says, fping will send several before giving up. If fping gets a reply from the first, does it continue to send more?
How can I turn up the debugging output for the conn tests, beyond this line in my [bbnet] config in hobbitlaunch.cfg:
CMD bbtest-net --report --ping --checkresponse --debug --timing
Is there one I can add for bbretest, since it'll be handling the network tests after the first conn failure?
- Would a hobbit network test ever initiate a connection using UDP on a high numbered port, like 37383?
Thanks for any help on this.
Tom
On Mon, Jun 19, 2006 at 04:08:06PM -0400, Tom Georgoulias wrote:
- How may pings are sent by fping to each host, by default? If I understand what the man page says, fping will send several before giving up. If fping gets a reply from the first, does it continue to send more?
fping sends 4 ping probes: One initial probe, and 3 retries.
- How can I turn up the debugging output for the conn tests, beyond this line in my [bbnet] config in hobbitlaunch.cfg:
CMD bbtest-net --report --ping --checkresponse --debug --timing
There's no more debugging to add. The --debug option should leave the fping logs in the ~hobbit/server/tmp/ directory, so any error output from fping will be saved there.
Is there one I can add for bbretest, since it'll be handling the network tests after the first conn failure?
The bbretest-net.sh script runs bbtest-net, so you can use the same options there. If you define the FPING setting with debug options for fping, then those will also be used by the re-tests.
- Would a hobbit network test ever initiate a connection using UDP on a high numbered port, like 37383?
No.
Regards, Henrik
Henrik Stoerner wrote:
On Mon, Jun 19, 2006 at 04:08:06PM -0400, Tom Georgoulias wrote:
- How may pings are sent by fping to each host, by default? If I understand what the man page says, fping will send several before giving up. If fping gets a reply from the first, does it continue to send more?
fping sends 4 ping probes: One initial probe, and 3 retries.
Is it correct to say that as soon as fping gets a reply back from the server, it considers that server available and marks it as such? Is it also correct to conclude that it takes 4 unanswered pings before Hobbit will consider a system unreachable?
Tom
On Tue, Jun 20, 2006 at 12:34:51PM -0400, Tom Georgoulias wrote:
Henrik Stoerner wrote:
On Mon, Jun 19, 2006 at 04:08:06PM -0400, Tom Georgoulias wrote:
- How may pings are sent by fping to each host, by default? If I understand what the man page says, fping will send several before giving up. If fping gets a reply from the first, does it continue to send more?
fping sends 4 ping probes: One initial probe, and 3 retries.
Is it correct to say that as soon as fping gets a reply back from the server, it considers that server available and marks it as such? Is it
Yes.
also correct to conclude that it takes 4 unanswered pings before Hobbit will consider a system unreachable?
Hobbit just uses whatever fping reports, so since fping takes 4 pings to deem a host unreachable - yes, that is correct.
Regards, Henrik
Henrik Stoerner wrote:
- Would a hobbit network test ever initiate a connection using UDP on a high numbered port, like 37383?
No.
BTW, I figured this out. The UDP connection attempt is a result of having the "trace" option in bb-hosts enabled. The traceroute will attempt a UDP connection to port 33435 on the failed server, then follow up with UDP conns to ports 33436 & 33437. The port I listed in my message above was from the Hobbit server, which has varied each time and thus didn't match anything obvious.
I'm just throwing this out there in case someone else runs a tcpdump on their hobbit server, with the "trace" option in effect, and can't figure out why some weird UDP traffic shows up after a system fails the conn test! ;)
Tom
participants (2)
-
henrik@hswn.dk
-
tomg@mcclatchyinteractive.com