OK, with everyone's help I have made progress. After trying all the different suggestions it came down to: Why can't I get an "rrdtool dump" output? The reason was that sometime in the past someone (probably me) managed to replace the rrdtool binary with an empty file (stop sniggering at the back please).
Having done this before I know how it happens... you type your command line: /opt/rrdtool/bin/rrdtool dump postfixqueue.rrd
but when using bash command line editing you manage to put a > at the start, making the command line:
/opt/rrdtool/bin/rrdtool dump postfixqueue.rrd
The file still keeps its execute permission, but executing an empty file returns nothing...
So, having got a real, working copy of the rrdtool program and running it on the dodgy data file I can see that data is indeed being stored there, and a vast number of lines look like this: <!-- 2008-07-25 12:45:00 UTC / 1216989900 --> <row><v> NaN </v><v> NaN </v><v> NaN </v><v> NaN </v><v> NaN </v></row>
a copy from one of the working ones shows: <!-- 2008-07-25 13:00:00 UTC / 1216990800 --> <row><v> 5.5427200000e+03 </v><v> 2.1861333333e+02 </v><v> 1.4601324333e+05 </v><v> 0.0000000000e+00 </v><v> 4.0939317667e+05 </v></row>
So it seems to be a problem with translating the output from the client program into data that RRD can understand.
Now, here is the contents of the hostlogs file of the working server, this should tie up with the data entry above:
red Friday July 25 12:59:31 UTC 2008
<br><br>
<pre>
ActiveStatus: &red ActiveQueue: 5494 ActiveTrend: tendency <b>rising</b> with <b>-81</b> mails. BounceStatus: &green BounceQueue: 219 BounceTrend: tendency <b>rising</b> <b>-2</b> mails. DeferStatus: &green DeferQueue: 145971 DeferTrend: amount equal to last measure. CorruptStatus: &green CorruptQueue: 0 CorruptTrend: amount equal to last measure. IncomingStatus: &red IncomingQueue: 409494 IncomingTrend: tendency <b>falling</b> with <b>858</b> mails.
</pre> Status unchanged in 0.00 minutes Message received from 10.44.107.107 Client data ID 1216990657
and here are the contents of the non-working one:
red Friday July 25 12:45:04 UTC 2008
<br><br>
<pre>
ActiveStatus: &green ActiveQueue: 39 ActiveTrend: tendency <b>falling</b> with <b>973</b> mails. BounceStatus: &green BounceQueue: 58 BounceTrend: amount equal to last measure. DeferStatus: &red DeferQueue: 154348 DeferTrend: tendency <b>falling</b> with <b>865</b> mails. CorruptStatus: &green CorruptQueue: 0 CorruptTrend: amount equal to last measure. IncomingStatus: &red IncomingQueue: 206927 IncomingTrend: tendency <b>rising</b> with <b>-206926</b> mails.
Deferred Queue is too high but is decreasing already.<br>
</pre> Status unchanged in 0.00 minutes Message received from 10.44.107.105 Client data ID 1216989837
As mentioned previously all these servers use the same scripts to send the data to the server and the same scripts to process it once it arrives, indeed as you can see above the two different entries look identical in format. I checked the scripts on the remote servers to see if there were any differences between them and found a few minor differences but nothing huge. Still, just to be sure I copied the postfixqueue.sh script from a working server to the broken one and waited for it to run. Alas, although the script transmits sensible data back to the Hobbit server:
ActiveStatus: &green ActiveQueue: 448 ActiveTrend: tendency falling with 9 mails. BounceStatus: &green BounceQueue: 59 BounceTrend: tendency rising -1 mails. DeferStatus: &green DeferQueue: 149697 DeferTrend: amount equal to last measure. CorruptStatus: &green CorruptQueue: 0 CorruptTrend: amount equal to last measure. IncomingStatus: &red IncomingQueue: 213848 IncomingTrend: amount equal to last measure.
The rrd file STILL contains: <!-- 2008-07-25 13:45:00 UTC / 1216993500 --> <row><v> NaN </v><v> NaN </v><v> NaN </v><v> NaN </v><v> NaN </v></row>
Any RRD experts got any ideas?
|\/|artin
-----Original Message-----
From: Phil Wild [mailto:philwild at gmail.com]
Sent: 24 July 2008 17:42
To: hobbit at hswn.dk
Subject: Re: [hobbit] Graphs are missing data, but it's there!
The rrd version should be okay, after all it is graphing data
from other hosts with no problem.
It would appear that you ncv and graph configurations are
correct as you say they are working for other hosts. This would indicate it is a problem with this host's configuration, so where to look...
Just out of interest, can you take an rrd file this test from a
host that works, and copy it into the .../data/rrd/hostname directory of the host that does not?
I would expect after doing this that you will have a graph for
this host. Can you confirm this works? After doing this and leaving it for 10 minutes, do you see any new data in the graph?
Can you dump the data from this rrd file?
2008/7/25 Ward, Martin <Martin.Ward at colt.net>:
> Are you saying that you run the same tests on multiple
hosts and only one host in not showing data? Yes. > Does this mean they all share the same NCV configuration in hobberserver.cfg and the same graph definition in hobbitgraph.cfg? Yes. > What if you remove the rrd file and let hobbit create a new one, does that help? I did this and as you'd expect initially the web page showed no graph although it did show data (stored from the previous run I presume). After an interval the file appeared again but running "rrdtool dump" on it STILL failed to produce any data. I'm starting to wonder about the versions of RRD, but they ought to be data-compatible; I'm using rrdtool v1.2.15. The histlogs show no errors, the hist/mc25,... data file contains valid data. I DO get a few RRD errors like this: rrd-status.log:2008-07-21 09:46:19 RRD error updating /opt/hobbit/data/rrd/mc25.lon.server.colt.net/tcp.smtp.rrd from 10.44.107.48: illegal attempt to update using time 1216633579 when last update time is 1216633579 (minimum one second step) which make it look like Hobbit is actually updating the RRD file... I just can't get any data out! |\/|
-----Original Message-----
From: Phil Wild [mailto:philwild at gmail.com]
Sent: 24 July 2008 16:31
To: hobbit at hswn.dk
Subject: Re: [hobbit] Graphs are missing data,
but it's there! Are you saying that you run the same tests on multiple hosts and only one host in not showing data? Does this mean they all share the same NCV configuration in hobberserver.cfg and the same graph definition in hobbitgraph.cfg? If this is correct, then it really points to something not getting into the rrd file. As previously suggested, rrd dump is your best bet at finding the problem here. What if you remove the rrd file and let hobbit create a new one, does that help? Cheers Phil 2008/7/24 Hubbard, Greg L <greg.hubbard at eds.com>:
You know the data exists because you
used the rrd dump tool to display it? Is the graph simply not shown at all, or is there a "hole" in the Web page where it normally would go? ("show page source" might have a clue). Some ideas/shots in the dark: a) check the logs b) meticulously compare a "working" system to the non-working system, and make sure that they really are identical. c) look at the trends page for this host to see if the graph is okay there... Etc. I am sure you know the drill -- a big pain to look under every rock, but it has to be done... GLH
From: Ward, Martin
[mailto:Martin.Ward at colt.net] Sent: Thursday, July 24, 2008 8:21 AM
To: hobbit at hswn.dk
Subject: RE: [hobbit] Graphs are missing
data, but it's there!
Thanks for the suggestion but that
didn't work (I guess you meant rrd). Any other ideas? |\/|
-----Original Message-----
From: Roberts, James
[mailto:James.Roberts at hants.gov.uk] Sent: 24 July 2008 12:47 To: hobbit at hswn.dk Subject: RE: [hobbit] Graphs are missing data, but it's there! you need to touch all the rdd.
From: Ward, Martin
[mailto:Martin.Ward at colt.net] Sent: 24 July 2008 12:43 To: hobbit at hswn.dk Subject: [hobbit] Graphs are missing data, but it's there!
All,
I have a problem with one machine where
its data is not being shown in the graphs even though the data exists.
The machine in question's Hobbit client
sends five pieces of numeric data (email queues) and these are displayed on the web page for this service:
====
Thursday July 24 11:29:11 UTC 2008
ActiveStatus: green
<http://hbt0.lon.oss.colt.net/hobbit/gifs/green.gif> ActiveQueue: 106 ActiveTrend: tendency rising with -60 mails. BounceStatus: green <http://hbt0.lon.oss.colt.net/hobbit/gifs/green.gif> BounceQueue: 58 BounceTrend: tendency falling with 3 mails. DeferStatus: red <http://hbt0.lon.oss.colt.net/hobbit/gifs/red.gif> DeferQueue: 150464 DeferTrend: tendency falling with 95 mails. CorruptStatus: green <http://hbt0.lon.oss.colt.net/hobbit/gifs/green.gif> CorruptQueue: 0 CorruptTrend: amount equal to last measure. IncomingStatus: red <http://hbt0.lon.oss.colt.net/hobbit/gifs/red.gif> IncomingQueue: 247049 IncomingTrend: amount equal to last measure. Deferred Queue is too high but is decreasing already.
====
These numbers change over time and the
values are accurate.
However, the graph that is displayed
below this data is blank. I have historic data, the files exist, and what is more I have other machines that are configured identically to this one where the data IS graphed correctly.
Hobbit graphs are a bit of a black hole
to me, can anyone suggest where I might look?
|\/|artin
The message is intended for the named
addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. The contents of this message and its attachments are confidential and may also be subject to legal privilege. If you are not the named addressee and/or have received this message in error, please advise us by e-mailing security at colt.net and delete the message and any attachments without retaining any copies. Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party. Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.
--
Tel: 0400 466 952
Fax: 0433 123 226
email: philwild AT gmail.com
To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk
--
Tel: 0400 466 952
Fax: 0433 123 226
email: philwild AT gmail.com
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way.
The contents of this message and its attachments are confidential and may also be subject to legal privilege. If you are not the named addressee and/or have received this message in error, please advise us by e-mailing security at colt.net and delete the message and any attachments without retaining any copies.
Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses.
No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.
Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.