On Tue, 29 Mar 2011 09:47:18 -0500, "Stewart, Tom L." <Tom.Stewart at landsend.com> wrote:
No, it seems I get fewer gaps using --no-cache, but it has always only affected the "cpu" and "users and processes" graphs.
I can see two possible causes for this.
There's a bug in the xymond_rrd module, so updates never make it to the rrd file.
There's some data missing from the client report, so there is no data to put into the rrd file. This can happen, e.g. if the client data message is too large so it gets truncated - which part of the client message is lost depends on the size of the message, and the sequence in which the individual sections (ps listing, network ports, log messages etc) are added to the client message.
I would like to try and see if the data really make it into the RRD module. There is an un-documented option to xymond_rrd that causes all data that should go into the RRD files to be dumped to an external command - this should tell us if there is any data show up at all.
So create this little shell script:
#!/bin/sh cat >/var/tmp/rrdfeed.txt exit 0
Save it somewhere - /usr/local/bin/rrddump.sh - then add "--processor=/usr/local/bin/rrddump.sh" to the xymond_rrd commandline in tasks.cfg. It will log an entry to the rrd logfile that the processor has started. Each time an update occurs, it will write a line to the rrdfeed.txt file, containing (among other things) the RRD filename, the hostname, and the data that should go into the RRD file (which includes a timestamp). This is logged *before* any of the RRD cache handling occurs.
So grep'ing for the RRD filename after a while when there are holes in the graph should tell us if there are any data missing.
Regards, Henrik