Hopefully I can explain this properly because I don't really understand what's going on.
I'm running 4.3.30 (Terabithia RPMs) on Centos 6.10
I have an external server that reports the number of files waiting to be processed by our internal app. We want this number to be 1 (0 means a problem, more than 1 means a queue is building) but for testing purposes I'm overriding the number using FILE_COUNT="$(( ( RANDOM % 31 ) ))" to give me any number between 0 and 30
I see this reflected correctly in my XYMONSRV status page. Every 5 minutes it reflects the new value and changes colour accordingly but I don't always see the associated RRD get updated.
Here's what I see if I tcpdump on the client (and grep for TTlogs.) It is sent every 5 minutes without fail. .status myserver.TTlogs red The count is too high. Data corruption may occur if this reaches 30. [ fileCount:20 ]
However, my associated RRD file hasn't updated in nearly 3 hours:
$ date Wed Nov 6 18:30:04 GMT 2024 $ ls -l TTlogs.rrd -rw-r--r-- 1 xymon xymon 217352 Nov 6 15:49 TTlogs.rrd
My troubleshooting of RRD is weak but maybe this will help.
The timestamp that 'rrdtool lastupdate' shows is 15:39 which corresponds with the timestamp on the file $ rrdtool lastupdate TTlogs.rrd fileCount 1730905160: 1
Fetching the LAST readings gives (with comments by me): $ rrdtool fetch TTlogs.rrd LAST fileCount 1730831700: -nan (time is November 5, 2024, at 20:15:00) ...same reading at 300 intervals until... 1730844300: -nan 1730844600: 1.0000000000e+00 (time is November 6, 2024, at 00:10:00) ... same at 300 intervals... 1730904900: 1.0000000000e+00 (time is November 6, 2024, at 15:35:00) 1730905200: -nan (time is November 6, 2024, at 15:40:00) ... same at 300 intervals... 1730918100: -nan (time is November 6, 2024, at 19:15:00 - in the future?)
So it was updating with 1 (the real reading) from 00:10:00 to 15:35:00 and then went back to -nan and it refuses to update even when I'm changing the reading every 5 minutes
TTlogs.rrd was generated automatically so it should be set up with whatever the Xymon defaults are.
My xymon config for TTlogs is an external file in xymonserver.cfg.d with: TEST2RRD+=",TTlogs=ncv" GRAPHS+=",TTlogs" NCV_TTlogs="fileCount:GAUGE"
I'm not exactly sure what this, if anything, tells me but maybe it helps <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd"> <!-- Round Robin Database Dump --> <rrd> <version>0003</version> <step>300</step> <!-- Seconds --> <lastupdate>1730905160</lastupdate> <!-- 2024-11-06 14:59:20 GMT -->
<ds>
<name> fileCount </name>
<type> GAUGE </type>
<minimal_heartbeat>600</minimal_heartbeat>
<min>NaN</min>
<max>NaN</max>
<!-- PDP Status -->
<last_ds>1</last_ds>
<value>2.6000000000e+02</value>
<unknown_sec> 0 </unknown_sec>
</ds>
<!-- Round Robin Archives -->
I'm kind of lost on what is going on. Any ideas of how to troubleshoot further?
Any help or advice would be gratefully received.