For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping.
This is only happening for some hosts (most). The rrd's have timestamps that would indicate they are being updated, an 'rrdtool dump' of one of the tcp rrd's shows that the values for all timestamps since the problem started are 0. No changes to the configuration have been made and as I mentioned, the actual tests are working.
Running hobbit snapshot from around Sep 5.
-- Geoff Steer <gsteer at firstwave.com.au>
-------------------------------Safe Stamp----------------------------------- The sender's Anti-virus Service scanned this email. It is safe from known viruses.
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping.
Any messages in /var/log/hobbit/rrd-status.log ?
Could you show me the output from "ls -l ~hobbit/data/rrd/HOSTNAME" ?
Are the graphs missing from both the individual status view (e.g. the "smtp" detailed status should have a graph at the bottom), and from the combined view on the "trends" page ? Or just one of them ?
Henrik
On Wed, 2005-09-28 at 07:28 +0200, Henrik Stoerner wrote:
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping.
Any messages in /var/log/hobbit/rrd-status.log ?
Could you show me the output from "ls -l ~hobbit/data/rrd/HOSTNAME" ?
Are the graphs missing from both the individual status view (e.g. the "smtp" detailed status should have a graph at the bottom), and from the combined view on the "trends" page ? Or just one of them ?
tail of /var/log/hobbit/rrd-status.log:
2005-09-26 11:09:47 RRD error updating /usr/local/hobbit/data/rrd/vwall.test.firstwave.com.au/tcp.ssh.rrd from 202.12.141.141: illegal attempt to update using time 1127696987 when last update time is 1127696987 (minimum one second step) 2005-09-26 11:09:47 RRD error updating /usr/local/hobbit/data/rrd/admin5.firstwave.com.au/tcp.ssh.rrd from 202.12.141.141: illegal attempt to update using time 1127696987 when last update time is 1127696987 (minimum one second step) 2005-09-26 11:09:47 RRD error updating /usr/local/hobbit/data/rrd/admin3.firstwave.com.au/tcp.ssh.rrd from 202.12.141.141: illegal attempt to update using time 1127696987 when last update time is 1127696987 (minimum one second step) 2005-09-26 11:09:47 RRD error updating /usr/local/hobbit/data/rrd/vwall.test.firstwave.com.au/tcp.smtp.rrd from 202.12.141.141: illegal attempt to update using time 1127696987 when last update time is 1127696987 (minimum one second step) 2005-09-28 14:53:04 Tried to down BOARDBUSY: Invalid argument 2005-09-28 14:54:10 Tried to down BOARDBUSY: Invalid argument 2005-09-28 15:12:14 Tried to down BOARDBUSY: Invalid argument 2005-09-28 15:22:46 Tried to down BOARDBUSY: Invalid argument 2005-09-28 15:24:37 Tried to down BOARDBUSY: Invalid argument 2005-09-28 15:26:11 Tried to down BOARDBUSY: Invalid argument
NOTE: vwall.test.firstwave.com.au is not one of the hosts showing this problem. Clocks are synced to a ntp server running on the hobbit server.
An ls -l of one host (only tcp related rrd's shown.
-rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.conn.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.ldap.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp5000.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.ssh.rrd
The problem shows up in both the trends and the detailed graphs.
-- Geoff Steer <gsteer at firstwave.com.au>
-------------------------------Safe Stamp----------------------------------- The sender's Anti-virus Service scanned this email. It is safe from known viruses.
In <1127887079.13766.6.camel at newtoy.homelan> Geoff Steer <gsteer at firstwave.com.au> writes:
On Wed, 2005-09-28 at 07:28 +0200, Henrik Stoerner wrote:
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping.
tail of /var/log/hobbit/rrd-status.log:
Nothing unusual in there.
An ls -l of one host (only tcp related rrd's shown.
-rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.conn.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.ldap.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp5000.rrd
And the files are being updated.
Could you send me (directly, not to the list) the output from bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg" as well as the RRD files for this host ?
When viewing the detailed status for e.g. ldap or smtp, do you get a graph image that is empty, or no image at all ?
Henrik
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping.
Geoff and I looked into this and he let me look at some of his data.
Apparently, his servers are responding faster than Hobbit can measure. Hobbit logs everything < 10 ms as "0.00 seconds", resulting in a flat line on the TCP response-time graphs.
So there is no bug, just some very speedy servers.
Regards, Henrik
participants (2)
-
gsteer@firstwave.com.au
-
henrik@hswn.dk