top ten list of servers wrt cpu load

18 Mar 2013


      Before I go inventing something I want to find out if anyone already has
done this.
We have a lot of virtual linux hosts (VMs on an ESX farm). We monitor all
of them with Xymon. When there is a widespread problem (as there was this
past weekend) the virtualization team would like to have a report on which
VMs top the list of, for example, cpu load from Xymon historical data. (Yes
there are ESX based tools, but they have not spent the $$ to put them on
all of the servers.) I pointed the team manager to the metrics report in
Xymon and he was impressed, but doesn't want to have to look at a graph
containing plots for a few hundred hosts to find the top 10.
So, I'm looking it writing a script to mine the rrd or history data from
the Xymon server to produce the list he wants. He is also interested in the
top disk I/O numbers, too, but I'm focusing on load average for now.
He says he just wants an average for each host over the 48 hours of the
weekend, which is when we usually see problems.
Has anyone done this or something like it? I don't see anything in Xymon
already built in to get close so I was looking at rrdtool fetch. However,
this is cumbersome and, frankly I'm not understanding the data I'm getting
back (for example 1.22749483e+03 seems to be 12.27... when I compare it to
the graphs, so the e+03 seems to really mean *10^1, right?)
But I ramble. Thanks for any help.
Steve Holmes
Purdue

sholmes42＠mac.com

jlaidman＠rebel-it.com.au

sholmes42＠gmail.com

sholmes42＠mac.com

tags

participants (3)