Hi Henrik,
I don't know if you read my previous response (see below), because it got sent using the wrong mail account. But I think I've found another issue: does the network retest procedure after a failed test ignore the "expect" setting in bb-services?
I tried to do some testing by deliberatly misconfiguring the expect setting for the FTP test (I set it to 221 in stead of 220), and now I have got a cyclical behaviour on the Hobbit server: it will turn all (five) FTP service tests yellow on the next test, but within a minute they all turn green again. Again five minutes later they turn yellow again, back green within a minute, etc. etc. This continues to happen until I put the expect 220 back in bb-services...
I don't think this is the correct behaviour?
Regards,
Eric.
Hi Henrik,
You're right, at least partially. I found out just now that the issue was with a misconfigured nsswitch.conf on the FTP server. That file still had entries for nis and nisplus in it, whicht caused the FTP banner response to be very slow (just about the length of the network test timeout I guess :-), due to the hostname lookup. The TCP connection would be established quickly, but the FTP banner didn't always appear in time.
But the weird thing is that some green FTP statuses (especially those following the yellow ones in the history) don't contain any response string either?!?
I only saw those FTP statuses at first and they made me try to put in some debugging code to get the actual response on the web page, directly behind the "Unexpected service response" text. My first attempt crashed the bbtest-net executable the next time the failure occured (exactly because there was no response, so I rewrote it to catch that and put in an explicit "(null)" text when no data was received), but in the meantime I found the cause of the issue.
Also, the "Seconds: N.NN" reported seems to be the time in which the TCP connection to the FTP server was established, not the total test time. That makes sense I suppose for the TCP timing statistics, but it threw me off-track in finding the solution for this problem. A yellow FTP status with 0.12 seconds duration did not indicate a timeout to me ;-)
BTW, I'm testing this on the 4.2 beta release with recent patches. I'm in the process of installing a new Hobbit server in our remote datacenter to monitor the production systems locally, so we won't experience Internet outages as downtime for our services (we're already running Hobbit remotely on two oldish servers from two remote offices, outages in ADSL connections in reporting actual service downtime to our customers). Alerts from the datacenter will go out through SMS. We're very happy with Hobbit so far!
Regards,
Eric.
Henrik Stoerner wrote:
On Mon, Jul 03, 2006 at 11:37:17AM +0200, Eric van de Meerakker (Mailings Lists) wrote:
I have a question on Hobbit: how can I find out what the exact "Unexpected service response" is on a network test? I have an FTP test that fails momentarily for (to me) mysterious reasons... Would it be possible to put the actual value of the unexpected service response in the error message?
It does that already, actually. If you don't see anything on the status page, it is because no data was received from the server. (And "no data" obviously doesn't match the "200" status we expect from an ftp server).
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk