diskstat.sh/RRD oddity

sholmes42＠mac.com

28 Mar 2012 28 Mar '12

10:02 p.m.

This is just a comment on an oddity with respect to diskstat.sh and RRD.

We make pretty heavy use of the diskstat.sh script, which I believe I downloaded from xymonton. When I installed it I used the standard clientlaunch.cfg stanza for the configuration and everything worked great.

I was called to task today because we have been having some disk io issues on the RHEL VMs and someone was looking at the trend graphs for some servers to see if there was anything they could learn and they noticed that beginning at about 4pm local time on Monday the graphs for the number of sectors written per second on a couple of file systems on several VMs jumped from the 10 to 20 range to the 300 to 340 range and stayed there. The graph for number of disk writes per second had a corresponding jump up to about 40 or 50 from close to zero.

In analyzing the data I discovered that the file system that was displaying this behavior is the same file system to which the diskstat.sh script is writing its temp files. It appears that for some reason, starting at 4pm on Monday the 5 minute test interval and the 5 minute average for RRD got in sync and all it was seeing was the data point that corresponded to its own writing activity and RRD was using it for the entire 5 minute average (of course, that's what RRD does).

I 'fixed' it by changing the test interval to something less than 5 minutes. I tried 2, 3, and 4 minutes and they all had the effect of reducing the data in the plot back to the expected level, i.e. to the level it was before 4pm on Monday.

The mystery remains why it suddenly started seeing and using its own disk activity at the same time on several different servers.

Steve Holmes ITaP/Purdue University

-- If they give you ruled paper, write the other way. -Juan Ramon Jimenez, poet, Nobel Prize in literature (1881-1958)

I prayed for freedom for twenty years, but received no answer until I prayed with my legs. -Frederick Douglass, Former slave, abolitionist, editor, and orator (1817-1895)

Show replies by date

everett.vernon＠gmail.com

29 Mar 29 Mar

1:46 a.m.

Hi Steve

I think this is the script that I wrote. Apologies that it got you into a bit of a mess, but I am am quite thrilled to hear that it worked as-is on RHEL. It was originally written for Solaris 10, and I made no effort to test it on anything else.

The other fix I would suggest is to change the sample time. Change DURATION variable at the top. The script takes a default 10-second "sample" of disk activity, and uses that as those values for graphing. It just may be that through a curious alignment of system times, that multiple clients were sending their data to the Xymon server at the same time as your system was sampling the IO. Change the sample to something like 30 seconds, and you might find you get a better average.

This was one of the risks with that script. Make the sample time too long, and we see too much of an average - very smooth graph. Too short,and we might pick up peaks (as in your case).

What I was originally looking for when I wrote the script, was sustained high IO, in which case, any sample size would have done the trick. So, for me, 10 seconds was as good a value as any, but feel free to experiment. YMMV. If you find some settings give significantly better results than others, feel free to add these notes to the Description or Known Bugs & Issues sections on Xymonton. And while you are there, if you can update the Compatibility entry to include your OS, that would be great.

Regards Vernon

On 29 March 2012 06:02, Steve Holmes <sholmes42 at mac.com> wrote:

...

This is just a comment on an oddity with respect to diskstat.sh and RRD.

We make pretty heavy use of the diskstat.sh script, which I believe I downloaded from xymonton. When I installed it I used the standard clientlaunch.cfg stanza for the configuration and everything worked great.

I was called to task today because we have been having some disk io issues on the RHEL VMs and someone was looking at the trend graphs for some servers to see if there was anything they could learn and they noticed that beginning at about 4pm local time on Monday the graphs for the number of sectors written per second on a couple of file systems on several VMs jumped from the 10 to 20 range to the 300 to 340 range and stayed there. The graph for number of disk writes per second had a corresponding jump up to about 40 or 50 from close to zero.

In analyzing the data I discovered that the file system that was displaying this behavior is the same file system to which the diskstat.sh script is writing its temp files. It appears that for some reason, starting at 4pm on Monday the 5 minute test interval and the 5 minute average for RRD got in sync and all it was seeing was the data point that corresponded to its own writing activity and RRD was using it for the entire 5 minute average (of course, that's what RRD does).

I 'fixed' it by changing the test interval to something less than 5 minutes. I tried 2, 3, and 4 minutes and they all had the effect of reducing the data in the plot back to the expected level, i.e. to the level it was before 4pm on Monday.

The mystery remains why it suddenly started seeing and using its own disk activity at the same time on several different servers.

Steve Holmes ITaP/Purdue University

-- If they give you ruled paper, write the other way. -Juan Ramon Jimenez, poet, Nobel Prize in literature (1881-1958)

I prayed for freedom for twenty years, but received no answer until I prayed with my legs. -Frederick Douglass, Former slave, abolitionist, editor, and orator (1817-1895)

Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon

-- "While it is futile to try to eliminate risk, and questionable to try to minimize it, it is essential that the risks taken be the right risks. "

Peter F. Drucker

Don.Kuhlman＠schawk.com

11 Oct 11 Oct

4:51 p.m.

Hi folks. I saved this email from early this year. I planned on using your script to produce disk I/o stats in xymon. Thanks for creating and sharing it! I have finally gotten to the point of adding it to our xymon installation. My question is – can we include these graphs in the "disk" column's graphs vs the trends column ? If so, what's the best way to do that?

Thanks

Don K

From: Vernon Everett <everett.vernon at gmail.com<mailto:everett.vernon at gmail.com>> Date: Thu, 29 Mar 2012 09:46:41 +0800 To: Steve Holmes <sholmes42 at mac.com<mailto:sholmes42 at mac.com>> Cc: <xymon at xymon.com<mailto:xymon at xymon.com>> Subject: Re: [Xymon] diskstat.sh/RRD oddity

Hi Steve

This was one of the risks with that script. Make the sample time too long, and we see too much of an average - very smooth graph. Too short,and we might pick up peaks (as in your case).

Regards Vernon

On 29 March 2012 06:02, Steve Holmes <sholmes42 at mac.com<mailto:sholmes42 at mac.com>> wrote: This is just a comment on an oddity with respect to diskstat.sh and RRD.

The mystery remains why it suddenly started seeing and using its own disk activity at the same time on several different servers.

Steve Holmes ITaP/Purdue University

-- If they give you ruled paper, write the other way. -Juan Ramon Jimenez, poet, Nobel Prize in literature (1881-1958)

I prayed for freedom for twenty years, but received no answer until I prayed with my legs. -Frederick Douglass, Former slave, abolitionist, editor, and orator (1817-1895)

Xymon mailing list Xymon at xymon.com<mailto:Xymon at xymon.com> http://lists.xymon.com/mailman/listinfo/xymon

-- "While it is futile to try to eliminate risk, and questionable to try to minimize it, it is essential that the risks taken be the right risks. "

Peter F. Drucker

_______________________________________________ Xymon mailing list Xymon at xymon.com<mailto:Xymon at xymon.com> http://lists.xymon.com/mailman/listinfo/xymon

jlaidman＠rebel-it.com.au

12 Oct 12 Oct

5:42 a.m.

On 12 October 2012 03:51, Don Kuhlman <Don.Kuhlman at schawk.com> wrote:

...

My question is – can we include these graphs in the "disk" column's graphs vs the trends column ?

Anything is possible.

...

If so, what's the best way to do that?

You could replace the graph.cfg entry for [disk] with one of the graph definitions provide with the script (eg [diskstat-reads]). But then you would lose existing graphs (if any) of disk space utilisation.

I want them both, so I have a separate "diskio" dot (actually fed in from devmon, but the principle is the same).

Don.Kuhlman＠schawk.com

15 Oct 15 Oct

3:49 p.m.

Thanks much Jeremy!

That's what I wanted to confirm, if I customize the [disk] graph, then would we lose those default graphs vs. just adding the IO stats to the same [disk] graphs.

PS – I'm using devmon too, but just getting into it. It sounds like devmon may give you better/other IO stats vs making custom Xymon graphs for disk io from the OS information provided.

Regards,

Don K

From: Jeremy Laidman <jlaidman at rebel-it.com.au<mailto:jlaidman at rebel-it.com.au>> Date: Fri, 12 Oct 2012 16:42:18 +1100 To: Don Kuhlman <don.kuhlman at schawk.com<mailto:don.kuhlman at schawk.com>> Cc: Vernon Everett <everett.vernon at gmail.com<mailto:everett.vernon at gmail.com>>, Steve Holmes <sholmes42 at mac.com<mailto:sholmes42 at mac.com>>, "xymon at xymon.com<mailto:xymon at xymon.com>" <xymon at xymon.com<mailto:xymon at xymon.com>> Subject: Re: [Xymon] diskstat.sh/RRD oddity

On 12 October 2012 03:51, Don Kuhlman <Don.Kuhlman at schawk.com<mailto:Don.Kuhlman at schawk.com>> wrote: My question is – can we include these graphs in the "disk" column's graphs vs the trends column ?

Anything is possible.

If so, what's the best way to do that?

I want them both, so I have a separate "diskio" dot (actually fed in from devmon, but the principle is the same).

jlaidman＠rebel-it.com.au

16 Oct 16 Oct

1:20 a.m.

On 16 October 2012 02:49, Don Kuhlman <Don.Kuhlman at schawk.com> wrote:

...

PS – I'm using devmon too, but just getting into it. It sounds like devmon may give you better/other IO stats vs making custom Xymon graphs for disk io from the OS information provided.

I'd much prefer to use Xymon to collect the iostat disk details (into [iostatdisk] client data), and there's code in there to do this for Solaris. However due to differences in how various OSes show their iostat output, Xymon support is not universal (yet). Perhaps, now that "sar" is more widely used, and generally consistent among OSes, this would be a way forward to universal support for disk iostat graphs.

I have nothing against devmon - I use it extensively. It's just that I would prefer to install and manage (and argue management for) one piece of software instead of two.

Don.Kuhlman＠schawk.com

18 Oct 18 Oct

4:21 p.m.

Thanks Jeremy. Your input is much appreciated.

I agree and would rather perform all tasks I can directly with Xymon.

Don

From: Jeremy Laidman <jlaidman at rebel-it.com.au<mailto:jlaidman at rebel-it.com.au>> Date: Tue, 16 Oct 2012 12:20:27 +1100 To: Don Kuhlman <don.kuhlman at schawk.com<mailto:don.kuhlman at schawk.com>> Cc: Xymon Email List <xymon at xymon.com<mailto:xymon at xymon.com>> Subject: Re: [Xymon] diskstat.sh/RRD oddity

On 16 October 2012 02:49, Don Kuhlman <Don.Kuhlman at schawk.com<mailto:Don.Kuhlman at schawk.com>> wrote: PS – I'm using devmon too, but just getting into it. It sounds like devmon may give you better/other IO stats vs making custom Xymon graphs for disk io from the OS information provided.

I have nothing against devmon - I use it extensively. It's just that I would prefer to install and manage (and argue management for) one piece of software instead of two.

Wim.Nelis＠nlr.nl

29 Mar 29 Mar

7:28 a.m.

Hello,

...

This is just a comment on an oddity with respect to diskstat.sh and RRD.

We make pretty heavy use of the diskstat.sh script, which I believe I downloaded from xymonton. When I installed it I used the standard clientlaunch.cfg stanza for the configuration and everything worked great.

I was called to task today because we have been having some disk io issues on the RHEL VMs and someone was looking at the trend graphs for some servers to see if there was anything they could learn and they noticed that beginning at about 4pm local time on Monday the graphs for the number of sectors written per second on a couple of file systems on several VMs jumped from the 10 to 20 range to the 300 to 340 range and stayed there. The graph for number of disk writes per second had a corresponding jump up to about 40 or 50 from close to zero.

In analyzing the data I discovered that the file system that was displaying this behavior is the same file system to which the diskstat.sh script is writing its temp files. It appears that for some reason, starting at 4pm on Monday the 5 minute test interval and the 5 minute average for RRD got in sync and all it was seeing was the data point that corresponded to its own writing activity and RRD was using it for the entire 5 minute average (of course, that's what RRD does). The diskstat.sh script collects a sample at each invocation. Thus the measurements cover only a fraction of time. As you seem to be using RHEL, you could use /proc/diskstats to get the same data, but in stead of a statistical sample, you will get averages since the last invocation (measurement). That would prohibit the "positive" interference described above, in which you measure your own measurement activities.

Regards, Wim Nelis.

The NLR disclaimer is valid for NLR e-mail messages.

This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of the sender. This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic transmission of messages.

sholmes42＠mac.com

3:23 p.m.

Thanks for the tips.

Vernon, We are using a version of your script that was posted to the list and begins with an if statement to detect the running OS as Solaris or not. The code on Xymonton appears to be the original version which does not work on Linux. But the tip still applies.

Wim, I had not looked at /proc/diskstats before, but it returns the total number of reads, writes, etc, since system boot, so I would have to do some coding and remember previous values to get what we get from iostat. At this point I have things working ok so I don't feel the need to write another script.

Thanks, Steve

On Thu, Mar 29, 2012 at 3:28 AM, W.J.M. Nelis <Wim.Nelis at nlr.nl> wrote:

...

Hello,

This is just a comment on an oddity with respect to diskstat.sh and RRD.

...
We make pretty heavy use of the diskstat.sh script, which I believe I downloaded from xymonton. When I installed it I used the standard clientlaunch.cfg stanza for the configuration and everything worked great.

I was called to task today because we have been having some disk io issues on the RHEL VMs and someone was looking at the trend graphs for some servers to see if there was anything they could learn and they noticed that beginning at about 4pm local time on Monday the graphs for the number of sectors written per second on a couple of file systems on several VMs jumped from the 10 to 20 range to the 300 to 340 range and stayed there. The graph for number of disk writes per second had a corresponding jump up to about 40 or 50 from close to zero.

In analyzing the data I discovered that the file system that was displaying this behavior is the same file system to which the diskstat.sh script is writing its temp files. It appears that for some reason, starting at 4pm on Monday the 5 minute test interval and the 5 minute average for RRD got in sync and all it was seeing was the data point that corresponded to its own writing activity and RRD was using it for the entire 5 minute average (of course, that's what RRD does).

The diskstat.sh script collects a sample at each invocation. Thus the measurements cover only a fraction of time. As you seem to be using RHEL, you could use /proc/diskstats to get the same data, but in stead of a statistical sample, you will get averages since the last invocation (measurement). That would prohibit the "positive" interference described above, in which you measure your own measurement activities.

Regards, Wim Nelis.

The NLR disclaimer is valid for NLR e-mail messages.

This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of the sender. This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic transmission of messages.

______________________________**_________________ Xymon mailing list Xymon at xymon.com http://lists.xymon.com/**mailman/listinfo/xymon<http://lists.xymon.com/mailman/listinfo/xymon>

-- If they give you ruled paper, write the other way. -Juan Ramon Jimenez, poet, Nobel Prize in literature (1881-1958)

I prayed for freedom for twenty years, but received no answer until I prayed with my legs. -Frederick Douglass, Former slave, abolitionist, editor, and orator (1817-1895)

jlaidman＠rebel-it.com.au

30 Mar 30 Mar

4:39 a.m.

On Fri, Mar 30, 2012 at 2:23 AM, Steve Holmes <sholmes42 at mac.com> wrote:

...

Wim, I had not looked at /proc/diskstats before, but it returns the total number of reads, writes, etc, since system boot, so I would have to do some coding and remember previous values to get what we get from iostat. At this point I have things working ok so I don't feel the need to write another script.

No you wouldn't. Just use a COUNTER instead of a GAUGE and RRD will automatically track the previous value and show the difference.

jlaidman＠rebel-it.com.au

4:39 a.m.

On Fri, Mar 30, 2012 at 2:23 AM, Steve Holmes <sholmes42 at mac.com> wrote:

...

Wim, I had not looked at /proc/diskstats before, but it returns the total number of reads, writes, etc, since system boot, so I would have to do some coding and remember previous values to get what we get from iostat. At this point I have things working ok so I don't feel the need to write another script.

No you wouldn't. Just use a COUNTER instead of a GAUGE and RRD will automatically track the previous value and show the difference.

4997

Age (days ago)

5201

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Don.Kuhlman＠schawk.com
everett.vernon＠gmail.com
jlaidman＠rebel-it.com.au
sholmes42＠mac.com
Wim.Nelis＠nlr.nl