These two graphs were generated from the same script, sending the same data to two different tests :
http://www.flickr.com/photos/betsys99/7705982914/in/photostream
You can see that the first test has chunks missing at regular intervals. These two graphs were made from the same data, sent from the same perl script, to two different tests, RTM and rtmstats. I did this originally back when I was having trouble with RRD choking on the extra information and wanted to separate the stats, but I've kept it because the one graph is so spotty. On the client side,my perl script for the test is logging that it is sending the data correctly, and I believe it's always displaying correctly on the test page
On the xymon server, I see the rrd file for RTM is missing some entries that are populated on the rtmstats, for example:
RTM: <!-- 2012-08-01 17:05:00 EDT / 1343855100 --> <row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row> <!-- 2012-08-01 17:10:00 EDT / 1343855400 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row> rtmstats: <!-- 2012-08-01 17:05:00 EDT / 1343855100 --> <row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row> <!-- 2012-08-01 17:10:00 EDT / 1343855400 --> <row><v>3.8017446000e+05</v><v>3.7497760000e+05</v><v>1.0860800000e+03</v><v>4.1067800000e+03</v><v>5.1928600000e+03</v><v>1.3678000000e+00</v></row>
What could be causing this discrepancy?
Looking at that interval on the client side log: /home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.RTM green Wed Aug 1 17:04:15 2012 Total : 357039 Success : 352086 Temp_Errors : 1058 Other_Errors : 3891 Total_Errors : 4949 Percent_Failure : 1.39% <SNIP lots more stuff>
'/home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.rtmstats green Wed Aug 1 17:04:15 2012 Total : 357039 Success : 352086 TempErrors : 1058 OtherErrors : 3891 TotalErrors : 4949 PercentFailure : 1.39%
accidentally snipped off the tail interval there, but it's still matching:
/home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.RTM green Wed Aug 1 17:09:20 2012 Total : 377910 Success : 372738 Temp_Errors : 1083 Other_Errors : 4085 Total_Errors : 5168 Percent_Failure : 1.37% <SNIP otherstuff>
'/home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.rtmstats green Wed Aug 1 17:09:20 2012 Total : 377910 Success : 372738 TempErrors : 1083 OtherErrors : 4085 TotalErrors : 5168 PercentFailure : 1.37%
On Fri, Aug 3, 2012 at 3:26 PM, Betsy Schwartz <betsy.schwartz at gmail.com> wrote:
These two graphs were generated from the same script, sending the same data to two different tests :
http://www.flickr.com/photos/betsys99/7705982914/in/photostream
You can see that the first test has chunks missing at regular intervals. These two graphs were made from the same data, sent from the same perl script, to two different tests, RTM and rtmstats. I did this originally back when I was having trouble with RRD choking on the extra information and wanted to separate the stats, but I've kept it because the one graph is so spotty. On the client side,my perl script for the test is logging that it is sending the data correctly, and I believe it's always displaying correctly on the test page
On the xymon server, I see the rrd file for RTM is missing some entries that are populated on the rtmstats, for example:
RTM: <!-- 2012-08-01 17:05:00 EDT / 1343855100 --> <row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row> <!-- 2012-08-01 17:10:00 EDT / 1343855400 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row> rtmstats: <!-- 2012-08-01 17:05:00 EDT / 1343855100 --> <row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row> <!-- 2012-08-01 17:10:00 EDT / 1343855400 --> <row><v>3.8017446000e+05</v><v>3.7497760000e+05</v><v>1.0860800000e+03</v><v>4.1067800000e+03</v><v>5.1928600000e+03</v><v>1.3678000000e+00</v></row>
What could be causing this discrepancy?
Looking at that interval on the client side log: /home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.RTM green Wed Aug 1 17:04:15 2012 Total : 357039 Success : 352086 Temp_Errors : 1058 Other_Errors : 3891 Total_Errors : 4949 Percent_Failure : 1.39% <SNIP lots more stuff>
'/home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.rtmstats green Wed Aug 1 17:04:15 2012 Total : 357039 Success : 352086 TempErrors : 1058 OtherErrors : 3891 TotalErrors : 4949 PercentFailure : 1.39%
I've seen this before but I don't remember the exact cause, I just remember that working on rrd stuff tended to add gray hair. Do both of these graphs have essentially the same rrd configs? If the foo.rrd files do not contain the detail for the tests I would truncate foo.rrd, and recheck my [foo] graph stanza. (rather, save a copy off foo.rrd first)
From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of Betsy Schwartz [betsy.schwartz at gmail.com] Sent: Friday, August 03, 2012 12:34 PM To: xymon at xymon.com Subject: Re: [Xymon] Another weirdness - RRD info has gaps
accidentally snipped off the tail interval there, but it's still matching:
/home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.RTM green Wed Aug 1 17:09:20 2012 Total : 377910 Success : 372738 Temp_Errors : 1083 Other_Errors : 4085 Total_Errors : 5168 Percent_Failure : 1.37% <SNIP otherstuff>
'/home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.rtmstats green Wed Aug 1 17:09:20 2012 Total : 377910 Success : 372738 TempErrors : 1083 OtherErrors : 4085 TotalErrors : 5168 PercentFailure : 1.37%
On Fri, Aug 3, 2012 at 3:26 PM, Betsy Schwartz <betsy.schwartz at gmail.com> wrote:
These two graphs were generated from the same script, sending the same data to two different tests :
http://www.flickr.com/photos/betsys99/7705982914/in/photostream
You can see that the first test has chunks missing at regular intervals. These two graphs were made from the same data, sent from the same perl script, to two different tests, RTM and rtmstats. I did this originally back when I was having trouble with RRD choking on the extra information and wanted to separate the stats, but I've kept it because the one graph is so spotty. On the client side,my perl script for the test is logging that it is sending the data correctly, and I believe it's always displaying correctly on the test page
On the xymon server, I see the rrd file for RTM is missing some entries that are populated on the rtmstats, for example:
RTM: <!-- 2012-08-01 17:05:00 EDT / 1343855100 --> <row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row> <!-- 2012-08-01 17:10:00 EDT / 1343855400 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row> rtmstats: <!-- 2012-08-01 17:05:00 EDT / 1343855100 --> <row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row> <!-- 2012-08-01 17:10:00 EDT / 1343855400 --> <row><v>3.8017446000e+05</v><v>3.7497760000e+05</v><v>1.0860800000e+03</v><v>4.1067800000e+03</v><v>5.1928600000e+03</v><v>1.3678000000e+00</v></row>
What could be causing this discrepancy?
Looking at that interval on the client side log: /home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.RTM green Wed Aug 1 17:04:15 2012 Total : 357039 Success : 352086 Temp_Errors : 1058 Other_Errors : 3891 Total_Errors : 4949 Percent_Failure : 1.39% <SNIP lots more stuff>
'/home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.rtmstats green Wed Aug 1 17:04:15 2012 Total : 357039 Success : 352086 TempErrors : 1058 OtherErrors : 3891 TotalErrors : 4949 PercentFailure : 1.39%
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
n Fri, Aug 3, 2012 at 4:17 PM, Tim McCloskey <tm at freedom.com> wrote:
I've seen this before but I don't remember the exact cause, I just remember that working on rrd stuff tended to add gray hair. Do both of these graphs have essentially the same rrd configs? If the foo.rrd files do not contain the detail for the tests I would truncate foo.rrd, and recheck my [foo] graph stanza. (rather, save a copy off foo.rrd first)
It's got *some* data but it's got gaps. I dropped the test, erased the RRD file and put it back, and I am *still* seeing gaps.
I swear my two tests are getting the exact same info:
push( @rtmstats, " \n Total : $totalcnt \n Success : $success \n TempErrors : $temperror\n OtherErrors : $othererror \n TotalErrors : $totalerror \n PercentFailure : $failure% \n" ); push ( @bbdata, at rtmstats); push ( @bbdata,< SNIP lots of other stuff>);
my $bbcmd = "$XYMON $XYMSRV 'status+12h $MACHINE.$TESTNAME $color $date @bbdata \n'"; system("$bbcmd"); print $bbcmd;
$bbcmd = "$XYMON $XYMSRV 'status+12h $MACHINE.rtmstats green $date @rtmstats \n'"; system("$bbcmd"); print $bbcmd;
comments at the bottom
From: Betsy Schwartz [betsy.schwartz at gmail.com] Sent: Friday, August 03, 2012 2:42 PM To: Tim McCloskey Cc: xymon at xymon.com Subject: Re: [Xymon] Another weirdness - RRD info has gaps
n Fri, Aug 3, 2012 at 4:17 PM, Tim McCloskey <tm at freedom.com> wrote:
I've seen this before but I don't remember the exact cause, I just remember that working on rrd stuff tended to add gray hair. Do both of these graphs have essentially the same rrd configs? If the foo.rrd files do not contain the detail for the tests I would truncate foo.rrd, and recheck my [foo] graph stanza. (rather, save a copy off foo.rrd first)
It's got *some* data but it's got gaps. I dropped the test, erased the RRD file and put it back, and I am *still* seeing gaps.
I swear my two tests are getting the exact same info:
push( @rtmstats, " \n Total : $totalcnt \n Success : $success \n TempErrors : $temperror\n OtherErrors : $othererror \n TotalErrors : $totalerror \n PercentFailure : $failure% \n" ); push ( @bbdata, at rtmstats); push ( @bbdata,< SNIP lots of other stuff>);
my $bbcmd = "$XYMON $XYMSRV 'status+12h $MACHINE.$TESTNAME $color $date @bbdata \n'"; system("$bbcmd"); print $bbcmd;
$bbcmd = "$XYMON $XYMSRV 'status+12h $MACHINE.rtmstats green $date @rtmstats \n'"; system("$bbcmd"); print $bbcmd;
Do both of these graphs have essentially the same rrd configs?
This would fail to provide any data if $TESTNAME did not expand, so I don't think that's it. Still, these two statements are glued together in different ways.
my $bbcmd = "$XYMON $XYMSRV 'status+12h $MACHINE.$TESTNAME $color $date @bbdata \n'";
$bbcmd = "$XYMON $XYMSRV 'status+12h $MACHINE.rtmstats green $date @rtmstats \n'";
- Do both of these graphs have essentially the same rrd configs? The graphs have identical rrd configs, cut'n'paste with the name change
- This would fail to provide any data if $TESTNAME did not expand, so I don't think that's it. Still, these two statements are glued together in different ways.
I'm getting *most* of the data in RRD , and I have not caught any gaps on the xymon test page.
thanks Betsy
On 03-08-2012 23:42, Betsy Schwartz wrote:
n Fri, Aug 3, 2012 at 4:17 PM, Tim McCloskey <tm at freedom.com> wrote:
I've seen this before but I don't remember the exact cause, I just remember that working on rrd stuff tended to add gray hair. Do both of these graphs have essentially the same rrd configs? If the foo.rrd files do not contain the detail for the tests I would truncate foo.rrd, and recheck my [foo] graph stanza. (rather, save a copy off foo.rrd first)
It's got *some* data but it's got gaps. I dropped the test, erased the RRD file and put it back, and I am *still* seeing gaps.
I swear my two tests are getting the exact same info:
What Xymon version are you running on the Xymon server ? Anything before 4.3.4 has a known bug that can cause this.
Are there any errors logged in your rrd-status.log / rrd-data.log files?
You're sending the messages with a lifetime of 12 hours. How often are you sending these messages ? If not once every 5 minutes, then the RRD-file must have a non-standard "step" and "heartbeat" value - you can check this with "rrdtool info myfile.rrd" - here's one from my server:
filename = "/var/lib/xymon/rrd/jorn.hswn.dk/la.rrd" rrd_version = "0003" step = 300 <snip> ds[la].minimal_heartbeat = 600
"step" is how often you're sending updates. "minimal_heartbeat" determines how long time may pass between updates before the data is considered invalid - in this case, if more than 600 seconds pass between two updates, then the data is not considered valid and will be ignored *unless* another update arrives within the next 600 seconds.
The "heartbeat" can be changed with rrdtuune. "step" cannot - you'll have to export the rrd-file to XML, edit the XML file and then restore the rrd-file from the XML.
Regards, Henrik
participants (3)
-
betsy.schwartz@gmail.com
-
henrik@hswn.dk
-
tm@freedom.com