Hello,
I have a question on Hobbit: how can I find out what the exact "Unexpected service response" is on a network test? I have an FTP test that fails momentarily for (to me) mysterious reasons... Would it be possible to put the actual value of the unexpected service response in the error message?
Regards,
Eric van de Meerakker.
On Mon, Jul 03, 2006 at 11:37:17AM +0200, Eric van de Meerakker (Mailings Lists) wrote:
I have a question on Hobbit: how can I find out what the exact "Unexpected service response" is on a network test? I have an FTP test that fails momentarily for (to me) mysterious reasons... Would it be possible to put the actual value of the unexpected service response in the error message?
It does that already, actually. If you don't see anything on the status page, it is because no data was received from the server. (And "no data" obviously doesn't match the "200" status we expect from an ftp server).
Regards, Henrik
Hi Henrik,
You're right, at least partially. I found out just now that the issue was with a misconfigured nsswitch.conf on the FTP server. That file still had entries for nis and nisplus in it, whicht caused the FTP banner response to be very slow (just about the length of the network test timeout I guess :-), due to the hostname lookup. The TCP connection would be established quickly, but the FTP banner didn't always appear in time.
But the weird thing is that some green FTP statuses (especially those following the yellow ones in the history) don't contain any response string either?!?
I only saw those FTP statuses at first and they made me try to put in some debugging code to get the actual response on the web page, directly behind the "Unexpected service response" text. My first attempt crashed the bbtest-net executable the next time the failure occured (exactly because there was no response, so I rewrote it to catch that and put in an explicit "(null)" text when no data was received), but in the meantime I found the cause of the issue.
Also, the "Seconds: N.NN" reported seems to be the time in which the TCP connection to the FTP server was established, not the total test time. That makes sense I suppose for the TCP timing statistics, but it threw me off-track in finding the solution for this problem. A yellow FTP status with 0.12 seconds duration did not indicate a timeout to me ;-)
BTW, I'm testing this on the 4.2 beta release with recent patches. I'm in the process of installing a new Hobbit server in our remote datacenter to monitor the production systems locally, so we won't experience Internet outages as downtime for our services (we're already running Hobbit remotely on two oldish servers from two remote offices, outages in ADSL connections in reporting actual service downtime to our customers). Alerts from the datacenter will go out through SMS. We're very happy with Hobbit so far!
Regards,
Eric.
Henrik Stoerner wrote:
On Mon, Jul 03, 2006 at 11:37:17AM +0200, Eric van de Meerakker (Mailings Lists) wrote:
I have a question on Hobbit: how can I find out what the exact "Unexpected service response" is on a network test? I have an FTP test that fails momentarily for (to me) mysterious reasons... Would it be possible to put the actual value of the unexpected service response in the error message?
It does that already, actually. If you don't see anything on the status page, it is because no data was received from the server. (And "no data" obviously doesn't match the "200" status we expect from an ftp server).
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Hi Henrik,
I don't know if you read my previous response (see below), because it got sent using the wrong mail account. But I think I've found another issue: does the network retest procedure after a failed test ignore the "expect" setting in bb-services?
I tried to do some testing by deliberatly misconfiguring the expect setting for the FTP test (I set it to 221 in stead of 220), and now I have got a cyclical behaviour on the Hobbit server: it will turn all (five) FTP service tests yellow on the next test, but within a minute they all turn green again. Again five minutes later they turn yellow again, back green within a minute, etc. etc. This continues to happen until I put the expect 220 back in bb-services...
I don't think this is the correct behaviour?
Regards,
Eric.
Hi Henrik,
You're right, at least partially. I found out just now that the issue was with a misconfigured nsswitch.conf on the FTP server. That file still had entries for nis and nisplus in it, whicht caused the FTP banner response to be very slow (just about the length of the network test timeout I guess :-), due to the hostname lookup. The TCP connection would be established quickly, but the FTP banner didn't always appear in time.
But the weird thing is that some green FTP statuses (especially those following the yellow ones in the history) don't contain any response string either?!?
I only saw those FTP statuses at first and they made me try to put in some debugging code to get the actual response on the web page, directly behind the "Unexpected service response" text. My first attempt crashed the bbtest-net executable the next time the failure occured (exactly because there was no response, so I rewrote it to catch that and put in an explicit "(null)" text when no data was received), but in the meantime I found the cause of the issue.
Also, the "Seconds: N.NN" reported seems to be the time in which the TCP connection to the FTP server was established, not the total test time. That makes sense I suppose for the TCP timing statistics, but it threw me off-track in finding the solution for this problem. A yellow FTP status with 0.12 seconds duration did not indicate a timeout to me ;-)
BTW, I'm testing this on the 4.2 beta release with recent patches. I'm in the process of installing a new Hobbit server in our remote datacenter to monitor the production systems locally, so we won't experience Internet outages as downtime for our services (we're already running Hobbit remotely on two oldish servers from two remote offices, outages in ADSL connections in reporting actual service downtime to our customers). Alerts from the datacenter will go out through SMS. We're very happy with Hobbit so far!
Regards,
Eric.
Henrik Stoerner wrote:
On Mon, Jul 03, 2006 at 11:37:17AM +0200, Eric van de Meerakker (Mailings Lists) wrote:
I have a question on Hobbit: how can I find out what the exact "Unexpected service response" is on a network test? I have an FTP test that fails momentarily for (to me) mysterious reasons... Would it be possible to put the actual value of the unexpected service response in the error message?
It does that already, actually. If you don't see anything on the status page, it is because no data was received from the server. (And "no data" obviously doesn't match the "200" status we expect from an ftp server).
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Hi,
Is it possible set change the default storage history in hobbit's rrd files ? -> i'ld like to be able to zoom in a graph and have a precise view on the last 30 days, not just the last 48 hours..
olivier
On Mon, Jul 03, 2006 at 02:54:56PM +0200, Beau Olivier wrote:
Is it possible set change the default storage history in hobbit's rrd files ? -> i'ld like to be able to zoom in a graph and have a precise view on the last 30 days, not just the last 48 hours..
It's possible with the new hobbit-hostgraphs tool available in the snapshots. It lets you select one or more hosts/graphs and a time period, and then it shows the graph for just that period of time.
You can try it on http://www.hswn.dk/hobbit/ - go to the "Systems" page and pick the "Report" -> "Metrics report" menu-item.
Regards, Henrik
Cool, i like this new CGI !
but my real question is the possibility to have a monthly data with a 5 minute average, instead of a 2 hour average.
Olivier
-----Message d'origine----- De : Henrik Stoerner [mailto:henrik at hswn.dk] Envoyé : lundi 3 juillet 2006 15:51 À : hobbit at hswn.dk Objet : Re: [hobbit] rrd question
On Mon, Jul 03, 2006 at 02:54:56PM +0200, Beau Olivier wrote:
Is it possible set change the default storage history in hobbit's rrd files ? -> i'ld like to be able to zoom in a graph and have a precise view on the last 30 days, not just the last 48 hours..
It's possible with the new hobbit-hostgraphs tool available in the snapshots. It lets you select one or more hosts/graphs and a time period, and then it shows the graph for just that period of time.
You can try it on http://www.hswn.dk/hobbit/ - go to the "Systems" page and pick the "Report" -> "Metrics report" menu-item.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Mon, Jul 03, 2006 at 03:59:10PM +0200, Beau Olivier wrote:
Cool, i like this new CGI !
but my real question is the possibility to have a monthly data with a 5 minute average, instead of a 2 hour average.
You'll have to modify how the RRD files are created, as others have pointed out. For existing files, you can do an "rrdtool export" of the current data, then create the RRD file with the same datasets but different RRA's, and import the old data into the new file.
It's tedious, but that is how RRDtool works - you cannot modify the definition of an RRD file after it has been created.
Regards, Henrik
I can't find this in my menu (menu-items.js.DIST) from todays snapshot.
Lars
----- Original Message ----- From: "Henrik Stoerner" <henrik at hswn.dk> To: <hobbit at hswn.dk> Sent: Monday, July 03, 2006 3:50 PM Subject: Re: [hobbit] rrd question
On Mon, Jul 03, 2006 at 02:54:56PM +0200, Beau Olivier wrote:
Is it possible set change the default storage history in hobbit's rrd files ? -> i'ld like to be able to zoom in a graph and have a precise view on the last 30 days, not just the last 48 hours..
It's possible with the new hobbit-hostgraphs tool available in the snapshots. It lets you select one or more hosts/graphs and a time period, and then it shows the graph for just that period of time.
You can try it on http://www.hswn.dk/hobbit/ - go to the "Systems" page and pick the "Report" -> "Metrics report" menu-item.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Henrik,
First: AWESOME Second: Would it be possible to specify hours and minutes? I know rrdtool not only lets you specify dates, but also hours/minutes. something like "-s 14:00 20060419 -e 15:00 20060419"
I havn't downloaded the code to see how you implemented the metric report, so I don't know how easy/hard this would be to implement.
Thanks, Jeff
On 7/3/06, Henrik Stoerner <henrik at hswn.dk> wrote:
On Mon, Jul 03, 2006 at 02:54:56PM +0200, Beau Olivier wrote:
Is it possible set change the default storage history in hobbit's rrd files ? -> i'ld like to be able to zoom in a graph and have a precise view on the last 30 days, not just the last 48 hours..
It's possible with the new hobbit-hostgraphs tool available in the snapshots. It lets you select one or more hosts/graphs and a time period, and then it shows the graph for just that period of time.
You can try it on http://www.hswn.dk/hobbit/ - go to the "Systems" page and pick the "Report" -> "Metrics report" menu-item.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Mon, Jul 03, 2006 at 05:43:35PM -0500, Jeff Newman wrote:
Second: Would it be possible to specify hours and minutes? I know rrdtool not only lets you specify dates, but also hours/minutes. something like "-s 14:00 20060419 -e 15:00 20060419"
I havn't downloaded the code to see how you implemented the metric report, so I don't know how easy/hard this would be to implement.
Not at all. I just modified the code to handle hour/minute/second selections if provided from the web form that calls the hostgraphs CGI. So although I won't add it to the default hostgraphs_form, you can add your own drop-down's with hour/minute/second selections and get your graphs for whatever period you like.
Henrik
Beau Olivier a écrit :
Hi,
Is it possible set change the default storage history in hobbit's rrd files ? -> i'ld like to be able to zoom in a graph and have a precise view on the last 30 days, not just the last 48 hours..
It seems the settings are hardcoded into hobbitd/do_rrd.c :
static char rra1[] = "RRA:AVERAGE:0.5:1:576"; static char rra2[] = "RRA:AVERAGE:0.5:6:576"; static char rra3[] = "RRA:AVERAGE:0.5:24:576"; static char rra4[] = "RRA:AVERAGE:0.5:288:576";
You'll have to modify these and recompile.
On Mon, Jul 03, 2006 at 02:17:43PM +0200, Eric van de Meerakker wrote:
I don't know if you read my previous response (see below), because it got sent using the wrong mail account. But I think I've found another issue: does the network retest procedure after a failed test ignore the "expect" setting in bb-services?
I tried to do some testing by deliberatly misconfiguring the expect setting for the FTP test (I set it to 221 in stead of 220), and now I have got a cyclical behaviour on the Hobbit server: it will turn all (five) FTP service tests yellow on the next test, but within a minute they all turn green again. Again five minutes later they turn yellow again, back green within a minute, etc. etc. This continues to happen until I put the expect 220 back in bb-services...
I don't think this is the correct behaviour?
Doesn't sound right, I'll have to agree.
The retest procedure should use the same parameters as the normal tests, but your experiment shows that it might not. Just to verify this, could you try modifying the ~hobbit/server/ext/bbretest-net.sh script and add "--check-response" to the bbtest-net command in there (after the "cat $REDOFILE") ?
Regards, Henrik
OK, I just tested that. The changed line was:
$BBHOME/bin/bbtest-net `cat $REDOFILE` --check-response
I've got the content of the "frequenttests." (the . is really there!) file for you as well:
"--ping" "--checkresponse" <site> <site2> <site3> <site4>
(So four sites in all. My count was off by one in my previous mail. I blame the heat ;-)
The change makes no difference, still yellow/green/yellow/green for all four sites simultaneously when I change de FTP expect setting in bb-services.
Regards,
Eric.
Henrik Stoerner wrote:
On Mon, Jul 03, 2006 at 02:17:43PM +0200, Eric van de Meerakker wrote:
I don't know if you read my previous response (see below), because it got sent using the wrong mail account. But I think I've found another issue: does the network retest procedure after a failed test ignore the "expect" setting in bb-services?
I tried to do some testing by deliberatly misconfiguring the expect setting for the FTP test (I set it to 221 in stead of 220), and now I have got a cyclical behaviour on the Hobbit server: it will turn all (five) FTP service tests yellow on the next test, but within a minute they all turn green again. Again five minutes later they turn yellow again, back green within a minute, etc. etc. This continues to happen until I put the expect 220 back in bb-services...
I don't think this is the correct behaviour?
Doesn't sound right, I'll have to agree.
The retest procedure should use the same parameters as the normal tests, but your experiment shows that it might not. Just to verify this, could you try modifying the ~hobbit/server/ext/bbretest-net.sh script and add "--check-response" to the bbtest-net command in there (after the "cat $REDOFILE") ?
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Hi Henrik,
I just redid the test with --checkresponse in stead of --check-response:
$BBHOME/bin/bbtest-net `cat $REDOFILE` --checkresponse
That one seems to produce the desired effect: tests are stable at yellow when I change the FTP expect setting.
I thought it might not be working properly because of the quotes in in the "frequenttests." file. Then I removed the --checkresponse and put an 'eval' in front of that line in stead:
eval $BBHOME/bin/bbtest-net `cat $REDOFILE`
it all works properly again (at least on my SuSE SLES9 installation). If not a fix, at least it is a workaround...
Regards,
Eric.
Henrik Stoerner wrote:
On Mon, Jul 03, 2006 at 02:17:43PM +0200, Eric van de Meerakker wrote:
I don't know if you read my previous response (see below), because it got sent using the wrong mail account. But I think I've found another issue: does the network retest procedure after a failed test ignore the "expect" setting in bb-services?
I tried to do some testing by deliberatly misconfiguring the expect setting for the FTP test (I set it to 221 in stead of 220), and now I have got a cyclical behaviour on the Hobbit server: it will turn all (five) FTP service tests yellow on the next test, but within a minute they all turn green again. Again five minutes later they turn yellow again, back green within a minute, etc. etc. This continues to happen until I put the expect 220 back in bb-services...
I don't think this is the correct behaviour?
Doesn't sound right, I'll have to agree.
The retest procedure should use the same parameters as the normal tests, but your experiment shows that it might not. Just to verify this, could you try modifying the ~hobbit/server/ext/bbretest-net.sh script and add "--check-response" to the bbtest-net command in there (after the "cat $REDOFILE") ?
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
participants (8)
-
cgoyard@cvf.fr
-
eric-list-1@softlution.com
-
eric.vdm@softlution.com
-
henrik@hswn.dk
-
jeffnewman75@gmail.com
-
lars.ebeling@leopg9.no-ip.org
-
lists-eric@softlution.com
-
olivier.beau@telecomitalia.fr