[hobbit] vmstat graphing with CPU io wait
Henrik,
AIX also reports i/o wait in its vmstat output in column 16 under wa of cpu which it would be nice to have in the graphs.
kthr memory page faults cpu
r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 2 238300 239351 0 0 0 60 108 0 110 79 83 8 30 30 32 0 2 238803 238827 0 0 0 0 0 0 498 3066 334 1 2 97 1
Chris
-----Original Message----- From: Henrik Stoerner [SMTP:henrik at hswn.dk] Sent: Tuesday, January 25, 2005 5:04 PM To: hobbit at hswn.dk Subject: Re: [hobbit] vmstat graphing with CPU io wait
On Tue, Jan 25, 2005 at 08:27:37AM -0500, Tom Georgoulias wrote:
Henrik Storner wrote:
Where do you get the I/O wait information from ?
On RHEL3 (procps-2.0.17-10), there is a value for it in column 14 of vmstat's output, labeled "wa" under "cpu"
Aha! So that's it - I had been wondering a bit why my load graphs didn't always add up to 100% !
This is quite interesting, and definitely something that should be tracked. So I hope you don't mind that I've tried adding it myself ...
One annoying bit with the RRD files is that changing the dataset (e.g. adding an extra variable) is not possible. So adding the cpu_wait data will break any existing vmstat data that has been collected. So if we're gonna break the vmstat RRD layout for Linux clients, we might as well do it now before the official release. And that should also include getting the very old layout (the one from Linux 2.2 kernels, with the "r b w" proces-counts) aligned with the new layout - effectively creating a single vmstat RRD format regardless of what Linux version you are running.
So: I've modified the Linux vmstat RRD layout to always include the "cpu_w" (from the very old vmstat version) and "cpu_wait" columns (from the latest vmstat versions). If the client doesn't report a value for these, they are set to the special RRD-value "undefined". So when someone upgrades a system from Linux 2.2. to 2.4, or from 2.4 to 2.6, the vmstat data will still work.
I've also defined a "vmstat1" graph similar to the normal "vmstat" graph, but with the cpu_wait data added (it stacks on top of the "system" time, below "user" time).
Some sample graphs (they don't have any data yet, so you're probably better off waiting a couple of hours before you view them):
Linux 2.6 host: http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=v mstat1&graph=hourly
Linux 2.4 host: http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=tyge.sslug.dk&service=vm stat1&graph=hourly
Linux 2.2 host (actually 2.4, but an old vmstat version): http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=fenris.hswn.dk&service=v mstat1&graph=hourly
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.
If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
On Wed, Jan 26, 2005 at 09:29:31AM -0000, Morris, Chris (Shared Services) wrote:
AIX also reports i/o wait in its vmstat output in column 16 under wa of cpu which it would be nice to have in the graphs.
kthr memory page faults cpu
r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 2 238300 239351 0 0 0 60 108 0 110 79 83 8 30 30 32 0 2 238803 238827 0 0 0 0 0 0 498 3066 334 1 2 97 1
Yes, I noticed that when I worked on the vmstat graphs yesterday. This was already being collected for AIX, so I made sure the names matched, so that the graph-definitions will work for both Linux and AIX.
You can try it with the AIX data you have. Add this to hobbitgraph.cfg - it's the definition for "vmstat1" I wrote yesterday. It gives you a graph with the CPU usage split into system, I/O wait, user and idle:
[vmstat1] TITLE CPU Utilization YAXIS % Load -u 100 -r DEF:cpu_idl=vmstat.rrd:cpu_idl:AVERAGE DEF:cpu_usr=vmstat.rrd:cpu_usr:AVERAGE DEF:cpu_sys=vmstat.rrd:cpu_sys:AVERAGE DEF:cpu_wait=vmstat.rrd:cpu_wait:AVERAGE AREA:cpu_sys#FF0000:System STACK:cpu_wait#774400:I/O wait STACK:cpu_usr#FFFF00:User STACK:cpu_idl#00FF00:Idle COMMENT:\n GPRINT:cpu_sys:LAST:System \: %5.1lf (cur) GPRINT:cpu_sys:MAX: \: %5.1lf (max) GPRINT:cpu_sys:MIN: \: %5.1lf (min) GPRINT:cpu_sys:AVERAGE: \: %5.1lf (avg)\n GPRINT:cpu_wait:LAST:I/O Wait\: %5.1lf (cur) GPRINT:cpu_wait:MAX: \: %5.1lf (max) GPRINT:cpu_wait:MIN: \: %5.1lf (min) GPRINT:cpu_wait:AVERAGE: \: %5.1lf (avg)\n GPRINT:cpu_usr:LAST:User \: %5.1lf (cur) GPRINT:cpu_usr:MAX: \: %5.1lf (max) GPRINT:cpu_usr:MIN: \: %5.1lf (min) GPRINT:cpu_usr:AVERAGE: \: %5.1lf (avg)\n GPRINT:cpu_idl:LAST:Idle \: %5.1lf (cur) GPRINT:cpu_idl:MAX: \: %5.1lf (max) GPRINT:cpu_idl:MIN: \: %5.1lf (min) GPRINT:cpu_idl:AVERAGE: \: %5.1lf (avg)\n
Now find one of your AIX boxes on the Hobbit webpages and look at the vmstat graphs. Then, in the browser change the part of the URL that says "service=vmstat" to "service=vmstat1". You should then see the new graph.
Or put "LARRD:*,vmstat:vmstat1" in the AIX-hosts' entry in bb-hosts and wait for bb-larrdcolumn to update the set of graphs shown by default.
Henrik
Henrik,
Are the vmstat patches you created ready for beta testing? Care to share them so I can test them out?
Tom
On Wed, Jan 26, 2005 at 07:44:21AM -0500, Tom Georgoulias wrote:
Are the vmstat patches you created ready for beta testing? Care to share them so I can test them out?
I plan on putting out a "release candidate" tomorrow.
There is a beta6-vmstat.patch file on http://www.hswn.dk/beta/ which has the vmstat changes; applies on top of beta-6.
After patching, run "make" and "make install", then restart hobbit (or at least hobbitd_larrd - if you just kill it, then hobbitlaunch will restart it automatically).
Make sure you copy over the new hobbitd/etcfiles/hobbitgraph.cfg file to ~hobbit/server/etc/
You also need to delete the existing ~hobbit/data/rrd/*/vmstat.rrd files (at least those from Linux systems), or you will get a lot of errors that it cannot update the vmstat.rrd file. Check the larrd-status.log and larrd-data.log files.
Henrik
Henrik Stoerner wrote:
On Wed, Jan 26, 2005 at 07:44:21AM -0500, Tom Georgoulias wrote:
Are the vmstat patches you created ready for beta testing? Care to share them so I can test them out?
I plan on putting out a "release candidate" tomorrow.
There is a beta6-vmstat.patch file on http://www.hswn.dk/beta/ which has the vmstat changes; applies on top of beta-6.
Thanks for providing the patch. I applied it and it built without any errors, but I'm still having problems getting it to work. I did copy over the new hobbitgraph.cfg file after installing & deleted the vmstat.rrd for the linux system in question before restarting.
So, my first question: I was looking at the patch and wasn't sure the array order is correct. (I'm not a programmer by any means, so if I'm wrong just say so).
on RHEL3, vmstat's CPU info columns are in this order:
user -12th system - 13th IO wait - 14th idle - 15th
For example (pardon the line wrap):
-bash-2.05b$ vmstat 2 procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy wa id 0 1 0 19036 27412 4370032 0 0 214 0 622 649 0 1 50 48
in the patch, you have cpu_idl =14 & cpu_wait=15. Is that backwards? Or am I out of my league (disclaimer: I hardly know anything about C programming).
static vmstat_layout_t vmstat_linux_layout[] = { { 0, "cpu_r" }, { 1, "cpu_b" }, { -1, "cpu_w" }, /* Not present for 2.4+ kernels, so log as "Undefined" */ { 2, "mem_swpd" }, { 3, "mem_free" }, { 4, "mem_buff" }, { 5, "mem_cach" }, { 6, "mem_si" }, { 7, "mem_so" }, { 8, "dsk_bi" }, { 9, "dsk_bo" }, { 10, "cpu_int" }, { 11, "cpu_csw" }, { 12, "cpu_usr" }, { 13, "cpu_sys" }, { 14, "cpu_idl" }, { 15, "cpu_wait" }, /* Requires kernel 2.6, but may not be present */ { -1, NULL } };
On Wed, Jan 26, 2005 at 11:47:30AM -0500, Tom Georgoulias wrote:
So, my first question: I was looking at the patch and wasn't sure the array order is correct. (I'm not a programmer by any means, so if I'm wrong just say so).
on RHEL3, vmstat's CPU info columns are in this order:
user -12th system - 13th IO wait - 14th idle - 15th
Argh! They swapped the order of the IO wait and idle counters!
Well, the simple way of fixing that is to just switch them around in hobbitgraph.cfg. But cpu_idl is used in a lot of graphs, so that does get rather messy.
So it's probably better to define RHEL3 as a new OS type, and setup it's own table for mapping the numbers to the RRD data.
Patch - on top of the previous one - attached. It compiles, but I haven't tested it. It assumes your vmstat data sends in "rhel3" as the name of the OS.
Henrik
Henrik Stoerner wrote:
on RHEL3, vmstat's CPU info columns are in this order:
user -12th system - 13th IO wait - 14th idle - 15th
Argh! They swapped the order of the IO wait and idle counters!
Frustrating, huh? And I'll bet it'll match Fedora's and others when procps gets updated in a future batch of errata. :(
Well, the simple way of fixing that is to just switch them around in hobbitgraph.cfg. But cpu_idl is used in a lot of graphs, so that does get rather messy.
So it's probably better to define RHEL3 as a new OS type, and setup it's own table for mapping the numbers to the RRD data.
That's what I've been doing. One problem that remains for me when doing this, or maybe there for other OSes as well, is the continued use of the "vmstat" graph in the vmstat status page. I'm going to try and adjust that so the rhel3 systems use vmstat1 and other OSes use whatever they need.
Patch - on top of the previous one - attached. It compiles, but I haven't tested it. It assumes your vmstat data sends in "rhel3" as the name of the OS.
I was going to share the patch I created, which looks almost the same, but I went ahead and used yours instead, though, just to be in sync with your sources.
Tom
Tom Georgoulias wrote:
Patch - on top of the previous one - attached. It compiles, but I haven't tested it. It assumes your vmstat data sends in "rhel3" as the name of the OS.
I was going to share the patch I created, which looks almost the same, but I went ahead and used yours instead, though, just to be in sync with your sources.
I think I spoke too soon. My Red Hat 7.1/7.3 systems need to use the same layout as debian3, so I had that in my patch. I also created the cpu_wait column for my freebsd systems, but left it undefined so that every system could use the same vmstat graph. For those that track IOwait, it'll use it. For those that do not, the parameter will show up in the legend and keep the value "nan". Not the prettiest, but much easier to maintain. Patch is attached, which relies on yours already being in place, in case you are interested. I hesitate to push for inclusion since RH 8.0, 9 and what ever else is out there may report their BBOSNAME as "redhat" but use a different vmstat, plus not everyone wants their graphs to include a parameter that might not exist. It's out there for whoever wants to use it.
I also included a simple, tiny patch to add an echo statement for starthobbit.sh that tells the user hobbit is stoppped, much like the message displayed when starting. I put it there as a way to clarify what is happening when the rest of my team starts messing around with hobbit. I'm thinking of creating a new symlink called "runhobbit.sh", just to match the old BB style and to try and avoid any confusion that may go along with a command that looks like this "starthobbit.sh stop"
Tom
participants (3)
-
CHRIS.MORRIS@RWEnpower.com
-
henrik@hswn.dk
-
tgeorgoulias@nandomedia.com