[hobbit] More Granular data than 300 second samples, duh!

15 Oct 2007


      On Sun, Oct 14, 2007 at 10:19:22PM -0400, Scott Walters wrote:
...
One of the most common requests to the trending of data is "How do I
make the charts graph data samples which are smaller than 300
seconds?"  And the answer has been, you have the source, have fun.
The original design decision that Henrik inherited was larrd should
only be for capacity planning and NOT real-time performance analysis.
Do one thing and do it well.
I had a thought the other day, and I think we could possibly get the
"best of both worlds."
Instead of
$ vmstat 300 2 (resulting in one 300 second sample)
why not
$ vmstat 5 61 (resulting in sixty 5 second samples)
The data would still only be transported every five minutes, but
contain more granular samples.
Scott, You have obviously been on the receiving end of LARRD related
questions for a long time, so I guess you know what the users have
asked for.
I haven't had a lot of requests for more granular data to begin with;
most of the requests have been for the fine-grained (5-minute) data to
be maintained for a longer period of time than the current 48 hours.
In the next version (or the current snapshot), you can define RRA's
individually for each type of RRD files. So you can configure the vmstat
RRD's to maintain the fine-grained data for a longer time. That should
take care of this issue.
I think your idea is worth looking into.
...
This could not be done for all metrics, but many.  This would also
require the RRAs of the all the RRDs be re-made (export, re-create,
import).  But I've that's been on my mind anyway cause the original
RRA structure was based on screen sizes for 800x600, instead of
business requirements.
If I understand your suggestion correctly, you would change the client
to run "vmstat 5 61" (for instance), collect all 60 samples, and then
send them off to Hobbit every 5 minutes. So we would essentially be
caching data for 5 minutes on the client, then send it off to the Hobbit
server and do a single multi-update of the RRD data when it arrives.
One complication with this is that Hobbit needs to determine the
timestamps for each of the samples, because RRDtool needs each
measurement timestamped. In the current setup, Hobbit just uses the time
that the data arrives from the client - this will be "close enough" to
the time the measurement was done to work. But if the client caches the
data for some amount of time, we have to find a way of generating the
correct timestamps. Just having the client timestamp it with its own
local time won't work - there are too many hosts where the clocks are
way off. I guess this could be done by having the client timestamp the
data, but then use these as relative timestamps (so we can see sample 10
was done 236 seconds before the last sample) and then work out the exact
timestamps over on the Hobbit server, like we do today.
This could be done - it would require a bit of change to the clients,
but I'm not really happy with the current way the vmstat data collection
works (it usually leaves a vmstat process hanging around when the client
is stopped), so I wouldn't mind having to do some code for this. I'd
probably write a small tool to run "vmstat 60" so it runs forever, and
then the tool would pick up the data, timestamp it and then regularly
feed it into the client report.
And of course the server-end would need changing to accomodate the new
data format and the multiple updates.  It's certainly doable, without a
whole lot of re-designing.
But I think we should consider which datasets one might want to have
these frequent updates for. vmstat is obvious; but what about memory
utilisation? Disk utilisation rarely changes rapidly - or perhaps it
does ? Process counts? Network test response times ? Once we start doing
it for vmstat, I'd expect everyone to come forward and ask for it for
lots of other datasets - so instead of doing a quick hack just for
vmstat, we should consider what would be the "right" way of doing it for
all/most of the data.
...
Henrik, do you follow my thinking?  It's kinda hard for me to believe
it's taken me over five years to think of this!
Things take time - and you often don't get it right until the third try.
...
My biggest concern is not the technical details of the collectors and
RRD/RRA restructuring, but inflicting resource usage on servers
measuring themselves.
$ vmstat 1 301 would definitely be a bad idea.
Agreed - but I don't think that should be something Hobbit decides. I
can easily imagine a scenario where you would do that for some
troubleshooting situation, and if that is what is needed then Hobbit
should let you do it. No reason to setup arbitrary restrictions.
(This is in line with Unix thinking - "if you insist on shooting your
foot off, it's your decision to do so". Just as "rm -rf /" is not
recommended, but still possible).
Regards,
Henrik

[hobbit] More Granular data than 300 second samples, duh!

henrik＠hswn.dk