[xymon] Re: holes in trend graphs
Has anyone made progress with this? I have the exact same problem with no solution and nothing interesting in the logs. As soon as I disable caching on xymond_rrd, the gaps go away.
However as soon as I did this, the graphs for "cpu load" and "users and processes" both stopped updating completely. All other trend graphs are perfect. None of the network tests had a problem, so this seems to be restricted to numbers coming from the xymon client as "client data".
-----Original Message----- From: "Stewart, Tom L." <Tom.Stewart (at) landsend.com> Date: Wed, 12 Jan 2011 09:18:03 -0600 To: xymon (at) xymon.com Subject: [xymon] Re: holes in trend graphs
I believe Henrik has found and fixed this issue, but I don't know if it is in the beta3 version. He has been talking about a new release.
Tom
-----Original Message----- From: Dominique Frise [mailto:dominique.frise (at) unil.ch] Sent: Wednesday, January 12, 2011 2:08 AM To: xymon (at) xymon.com Subject: [xymon] Re: holes in trend graphs
I put a note in last week saying I was having the same issue in the 4.3 release and looks like the same issue that was in the beta. The holes are totally random and I asked if the --no-cache still works the same in the xymond_rrd, but I have not seen anyone respond.
Tom
-----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Jeremy Laidman Sent: Wednesday, March 23, 2011 9:44 PM To: xymon at xymon.com Subject: [Xymon] [xymon] Re: holes in trend graphs
Has anyone made progress with this? I have the exact same problem with no solution and nothing interesting in the logs. As soon as I disable caching on xymond_rrd, the gaps go away.
However as soon as I did this, the graphs for "cpu load" and "users and processes" both stopped updating completely. All other trend graphs are perfect. None of the network tests had a problem, so this seems to be restricted to numbers coming from the xymon client as "client data".
-----Original Message----- From: "Stewart, Tom L." <Tom.Stewart (at) landsend.com> Date: Wed, 12 Jan 2011 09:18:03 -0600 To: xymon (at) xymon.com Subject: [xymon] Re: holes in trend graphs
I believe Henrik has found and fixed this issue, but I don't know if it is in the beta3 version. He has been talking about a new release.
Tom
-----Original Message----- From: Dominique Frise [mailto:dominique.frise (at) unil.ch] Sent: Wednesday, January 12, 2011 2:08 AM To: xymon (at) xymon.com Subject: [xymon] Re: holes in trend graphs
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Tom, do you get the same as me with the "--no-cache" option:
- gaps for most trend graphs go away
- trend graphs for "cpu" and "users and processes" went to zero (actually NaN)
J
On Fri, Mar 25, 2011 at 12:35 AM, Stewart, Tom L. <Tom.Stewart at landsend.com> wrote:
I put a note in last week saying I was having the same issue in the 4.3 release and looks like the same issue that was in the beta. The holes are totally random and I asked if the --no-cache still works the same in the xymond_rrd, but I have not seen anyone respond.
Tom
-----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Jeremy Laidman Sent: Wednesday, March 23, 2011 9:44 PM To: xymon at xymon.com Subject: [Xymon] [xymon] Re: holes in trend graphs
Has anyone made progress with this? I have the exact same problem with no solution and nothing interesting in the logs. As soon as I disable caching on xymond_rrd, the gaps go away.
However as soon as I did this, the graphs for "cpu load" and "users and processes" both stopped updating completely. All other trend graphs are perfect. None of the network tests had a problem, so this seems to be restricted to numbers coming from the xymon client as "client data".
-----Original Message----- From: "Stewart, Tom L." <Tom.Stewart (at) landsend.com> Date: Wed, 12 Jan 2011 09:18:03 -0600 To: xymon (at) xymon.com Subject: [xymon] Re: holes in trend graphs
I believe Henrik has found and fixed this issue, but I don't know if it is in the beta3 version. He has been talking about a new release.
Tom
-----Original Message----- From: Dominique Frise [mailto:dominique.frise (at) unil.ch] Sent: Wednesday, January 12, 2011 2:08 AM To: xymon (at) xymon.com Subject: [xymon] Re: holes in trend graphs
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
No, it seems I get fewer gaps using --no-cache, but it has always only affected the "cpu" and "users and processes" graphs. When I can catch the missing data in process, I see that the rrd is never updated or timestamped, but I never see any type of error message in any of the log files.
I also forget to mention that I am only using one setting for the rrd definitions and I do have the extra-rrd definition for the soloris MP and Zone stats.
Here is what the configuration file looks like for the rrd... stuff in tasks.cfg.
"rrdstatus" updates RRD files with information that arrives as "status" messages.
[rrdstatus] ENVFILE /home/xymon/server/etc/xymonserver.cfg NEEDS xymond CMD xymond_channel --channel=status --log=$XYMONSERVERLOGS/rrd-status.log xymond_rrd --no-cache --rrddir=$XYMONVAR/rrd
"rrddata" updates RRD files with information that arrives as "data" messages.
[rrddata] ENVFILE /home/xymon/server/etc/xymonserver.cfg NEEDS xymond CMD xymond_channel --channel=data --log=$XYMONSERVERLOGS/rrd-data.log xymond_rrd --no-cache --rrddir=$XYMONVAR/rrd --extra-te sts=mpstat,zonestat --extra-script=/home/xymon/server/ext/rrd_data.pl
And here is the rrddefinitions.cfg file for only what I have changed.
This one is the default setup. You can change it, if you like.
[] # 576 datapoints w/ 5 minute interval = 48 hours @ 5 min avg. RRA:AVERAGE:0.5:1:82944 # 576 datapoints w/ 6*5 minute averaged = 12 days @ 5 min avg. #RRA:AVERAGE:0.5:6:82944 **** NOT USED **** # 576 datapoints w/ 24*5 minute averaged = 48 days @ 5 min avg. #RRA:AVERAGE:0.5:24:82944 **** NOT USED **** # 576 datapoints w/ 288*5 minute averaged = 576 days @ 5 min avg. #RRA:AVERAGE:0.5:288:82944 **** NOT USED ****
Tom
-----Original Message----- From: Jeremy Laidman [mailto:jlaidman at rebel-it.com.au] Sent: Monday, March 28, 2011 9:42 PM To: Stewart, Tom L. Cc: xymon at xymon.com Subject: Re: [Xymon] [xymon] Re: holes in trend graphs
Tom, do you get the same as me with the "--no-cache" option:
- gaps for most trend graphs go away
- trend graphs for "cpu" and "users and processes" went to zero (actually NaN)
J
On Fri, Mar 25, 2011 at 12:35 AM, Stewart, Tom L. <Tom.Stewart at landsend.com> wrote:
I put a note in last week saying I was having the same issue in the 4.3 release and looks like the same issue that was in the beta. The holes are totally random and I asked if the --no-cache still works the same in the xymond_rrd, but I have not seen anyone respond.
Tom
-----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Jeremy Laidman Sent: Wednesday, March 23, 2011 9:44 PM To: xymon at xymon.com Subject: [Xymon] [xymon] Re: holes in trend graphs
Has anyone made progress with this? I have the exact same problem with no solution and nothing interesting in the logs. As soon as I disable caching on xymond_rrd, the gaps go away.
However as soon as I did this, the graphs for "cpu load" and "users and processes" both stopped updating completely. All other trend graphs are perfect. None of the network tests had a problem, so this seems to be restricted to numbers coming from the xymon client as "client data".
-----Original Message----- From: "Stewart, Tom L." <Tom.Stewart (at) landsend.com> Date: Wed, 12 Jan 2011 09:18:03 -0600 To: xymon (at) xymon.com Subject: [xymon] Re: holes in trend graphs
I believe Henrik has found and fixed this issue, but I don't know if it is in the beta3 version. He has been talking about a new release.
Tom
-----Original Message----- From: Dominique Frise [mailto:dominique.frise (at) unil.ch] Sent: Wednesday, January 12, 2011 2:08 AM To: xymon (at) xymon.com Subject: [xymon] Re: holes in trend graphs
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
On Tue, 29 Mar 2011 09:47:18 -0500, "Stewart, Tom L." <Tom.Stewart at landsend.com> wrote:
No, it seems I get fewer gaps using --no-cache, but it has always only affected the "cpu" and "users and processes" graphs.
I can see two possible causes for this.
There's a bug in the xymond_rrd module, so updates never make it to the rrd file.
There's some data missing from the client report, so there is no data to put into the rrd file. This can happen, e.g. if the client data message is too large so it gets truncated - which part of the client message is lost depends on the size of the message, and the sequence in which the individual sections (ps listing, network ports, log messages etc) are added to the client message.
I would like to try and see if the data really make it into the RRD module. There is an un-documented option to xymond_rrd that causes all data that should go into the RRD files to be dumped to an external command - this should tell us if there is any data show up at all.
So create this little shell script:
#!/bin/sh cat >/var/tmp/rrdfeed.txt exit 0
Save it somewhere - /usr/local/bin/rrddump.sh - then add "--processor=/usr/local/bin/rrddump.sh" to the xymond_rrd commandline in tasks.cfg. It will log an entry to the rrd logfile that the processor has started. Each time an update occurs, it will write a line to the rrdfeed.txt file, containing (among other things) the RRD filename, the hostname, and the data that should go into the RRD file (which includes a timestamp). This is logged *before* any of the RRD cache handling occurs.
So grep'ing for the RRD filename after a while when there are holes in the graph should tell us if there are any data missing.
Regards, Henrik
I have added the script and will send the info when it happens again.
Thank you, Tom
-----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of henrik at hswn.dk Sent: Wednesday, March 30, 2011 4:13 AM To: xymon at xymon.com Subject: Re: [Xymon] [xymon] Re: holes in trend graphs
On Tue, 29 Mar 2011 09:47:18 -0500, "Stewart, Tom L." <Tom.Stewart at landsend.com> wrote:
No, it seems I get fewer gaps using --no-cache, but it has always only affected the "cpu" and "users and processes" graphs.
I can see two possible causes for this.
There's a bug in the xymond_rrd module, so updates never make it to the rrd file.
There's some data missing from the client report, so there is no data to put into the rrd file. This can happen, e.g. if the client data message is too large so it gets truncated - which part of the client message is lost depends on the size of the message, and the sequence in which the individual sections (ps listing, network ports, log messages etc) are added to the client message.
I would like to try and see if the data really make it into the RRD module. There is an un-documented option to xymond_rrd that causes all data that should go into the RRD files to be dumped to an external command - this should tell us if there is any data show up at all.
So create this little shell script:
#!/bin/sh cat >/var/tmp/rrdfeed.txt exit 0
Save it somewhere - /usr/local/bin/rrddump.sh - then add "--processor=/usr/local/bin/rrddump.sh" to the xymond_rrd commandline in tasks.cfg. It will log an entry to the rrd logfile that the processor has started. Each time an update occurs, it will write a line to the rrdfeed.txt file, containing (among other things) the RRD filename, the hostname, and the data that should go into the RRD file (which includes a timestamp). This is logged *before* any of the RRD cache handling occurs.
So grep'ing for the RRD filename after a while when there are holes in the graph should tell us if there are any data missing.
Regards, Henrik
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
On Wed, Mar 30, 2011 at 8:12 PM, <henrik at hswn.dk> wrote:
#!/bin/sh cat >/var/tmp/rrdfeed.txt exit 0
... and "chmod +x" the file.
I have a gap in my graph for "memory.actual" right now.
$ strings /var/tmp/rrdfeed.txt |egrep " dnsadm2.in.*actual"|tail realmempct 1301533439:4 dnsadm2.in.X.com.au memory actual realmempct 1301533739:4 dnsadm2.in.X.com.au memory actual realmempct 1301533739:4 dnsadm2.in.X.com.au memory actual realmempct 1301534039:4 dnsadm2.in.X.com.au memory actual realmempct 1301534339:4 dnsadm2.in.X.com.au memory actual realmempct 1301534639:4 dnsadm2.in.X.com.au memory actual
$ rrdtool fetch /var/lib/xymon/rrd/dnsadm2.in.X.com.au/memory.actual.rrd AVERAGE | tail 1301532300: 4.0000000000e+00 1301532600: 4.0000000000e+00 1301532900: 4.0000000000e+00 1301533200: 4.0000000000e+00 1301533500: 4.0000000000e+00 1301533800: nan 1301534100: nan 1301534400: nan 1301534700: nan 1301535000: nan
On Thu, Mar 31, 2011 at 12:29 PM, Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:
I have a gap in my graph for "memory.actual" right now.
Hmm. In the end, I did some adjustments to my two xymon servers to only list themselves as display servers, rather than each listing both servers. Otherwise I was getting multiple messages passing from one to the other, and I reckon this was causing problems, possibly this one too.
After a reboot, I lost the "--processor" script, and since removing the configuration for it, I now get graphs on both servers, and neither has gaps. I'm happy, even though I'm not entirely sure of the cause. But I suspect it's a bad thing to list more than one display server on a display server.
Cheers Jeremy
I have the same issue, and I have sent Henrik some data. I know he has been occupied with the eye stuff, so I didn't want to send additional data until he is better.
Tom
-----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Jeremy Laidman Sent: Thursday, May 05, 2011 10:27 PM To: xymon at xymon.com Subject: Re: [Xymon] [xymon] Re: holes in trend graphs
On Thu, Mar 31, 2011 at 12:29 PM, Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:
I have a gap in my graph for "memory.actual" right now.
Hmm. In the end, I did some adjustments to my two xymon servers to only list themselves as display servers, rather than each listing both servers. Otherwise I was getting multiple messages passing from one to the other, and I reckon this was causing problems, possibly this one too.
After a reboot, I lost the "--processor" script, and since removing the configuration for it, I now get graphs on both servers, and neither has gaps. I'm happy, even though I'm not entirely sure of the cause. But I suspect it's a bad thing to list more than one display server on a display server.
Cheers Jeremy
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
participants (3)
-
henrik@hswn.dk
-
jlaidman@rebel-it.com.au
-
Tom.Stewart@landsend.com