[hobbit] Dumb hobbit network test question
OK -- I used the --debug option; it wasn't as bad as I thought it would be, the resulting log was just over 11 MB when my problem occurred and I could turn it off.
Henrik, can you clarify what this really means?
Address=10.8.224.9:21, open=1, res=0, err=1, connecttime=0.003110, totaltime=10.063026, Address=10.8.224.38:21, open=1, res=0, err=0, connecttime=0.003060, totaltime=0.028471, banner='220 wabash FTP server (Version 4.2 Sat Feb 5 10:12:55 CST 2005) ready. 221 Goodbye. ' (86 bytes) (good response)
2005-11-30 14:03:59 tcp_got_expected: No data in banner 2005-11-30 14:03:59 Adding to combo msg: status volga.ftp yellow <!-- [flags:OrdastILe] --> Wed Nov 30 14:03:00 2005 ftp NOT ok
This system is showing a load of 0.1 (max 2.0) om a 2-way 1.6 GHz machine; the FTP connect time is 17.4 microseconds (avg) and peaked in the last 48 hours at 5.2 milliseconds
TIA
Tom Kauffman NIBCO, Inc
-----Original Message----- From: Kauffman, Tom [mailto:KauffmanT at nibco.com] Sent: Tuesday, November 29, 2005 11:28 AM To: hobbit at hswn.dk Subject: [hobbit] Dumb hobbit network test question
I've just done my tri-annual hardware shuffle, swapping out all my on-lease RS-6000 for brand spanking new systems. Part of this included upgrading from AIX 5.1 to AIX 5.3.
Now, on half the systems (the last set replaced) I get errors on the smtp and ftp tests -- typically one test every two hours. Interestingly, all these systems barf on the same test cycle. This is obviously something not quite right in the AIX config, but I'm at a loss on what. I just found out today that we also have production rsh scripts that time out on the same cycle (yeah, I know -- but they've been rsh since year dot, and getting them to ssh is real low on the list . . )
Here's a sample of the network test error: Service ftp on hudson is not OK : Unexpected service response Service smtp on hudson is not OK : Unexpected service response
How can I log the actual response? I'm currently running hobbit 4.03rc1; that's scheduled to change sometime next week.
Other suggestions?
TIA --
Tom Kauffman NIBCO, Inc
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Wed, Nov 30, 2005 at 03:38:56PM -0500, Kauffman, Tom wrote:
Henrik, can you clarify what this really means?
Address=10.8.224.9:21, open=1, res=0, err=1, connecttime=0.003110, totaltime=10.063026,
"open=1" means that the connection to the server succeeded. The interesting thing here is that it took only 0.003 seconds to get a connection, but then Hobbit spent more than 10 seconds waiting for a banner to appear. It never did - at least not within those 10 secs; the "err=1" means it gave up waiting for the data and signals a timeout.
Address=10.8.224.38:21, open=1, res=0, err=0, connecttime=0.003060, totaltime=0.028471, banner='220 wabash FTP server (Version 4.2 Sat Feb 5 10:12:55 CST 2005) ready. 221 Goodbye.' (86 bytes)
This is a different server. Again, connecting takes about 0.003 secs, but the banner appears almost immediately - the entire exchange happens in 28 milliseconds.
It might be that the FTP server performs a reverse DNS lookup of the Hobbit servers' IP address when Hobbit connects to check the FTP service. Sometimes DNS lookups take a while - maybe long enough for Hobbit to reach the 10 seconds timeout. Maybe your ftp server has a local DNS cache, and the timeout only happens when the cached DNS entry expires and has to be refreshed.
One thing you can try is to add a "--timeout=30" option to the bbtest-net command in hobbitlaunch.cfg; that makes it wait up to 30 seconds before flagging a timeout.
Regards, Henrik
participants (2)
-
henrik@hswn.dk
-
KauffmanT@nibco.com