[hobbit] TCP/IP stats (bits/s) limited to 100M
On Wed, Jun 28, 2006 at 04:07:44PM +0200, Nicolas Dorfsman wrote:
The RRD files are created as "DERIVE" datatypes with a minimum
value of 0, which should handle 32/64-bit counter overflows automatically. (See the rrdcreate man-page).Well...the man is not so confident :
If you cannot tolerate ever mistaking the occasional counter reset for a legitimate counter wrap, and would prefer "Unknowns" for all legitimate counter wraps and resets, always use DERIVE with min=0. Otherwise, using COUNTER with a suitable max will return correct values for all legitimate counter wraps, mark some counter resets as "Unknown", but can mistake some counter resets for a legitimate counter wrap.
OK, you got me on that one.
It seems that using COUNTER for the byte-counts in both the netstat- and ifstat-RRD's might be a good idea. The question then becomes "what's a suitable max" for these data ? Should I assume they are 32-bit counters ? I know some of them are not (e.g. Solaris has 64-bit counters for bytes in/out per interface).
I'll change it to a counter now, with MAX set to "unknown". The overflow handling should still work correctly, if I understand the RRD docs right.
Note: This doesn't affect all of the existing RRD's, only new ones created.
Regards, Henrik
On Jul 9, 2006, at 12:18 PM, Henrik Stoerner wrote:
OK, you got me on that one.
Not really, you inherited this ;) He is trying to get me, and his
point is valid, but the tool 'works as designed', read on . . .
It seems that using COUNTER for the byte-counts in both the netstat- and ifstat-RRD's might be a good idea.
*might* being the operative word there
The question then becomes "what's a suitable max" for these data ? Should I assume they are 32-bit counters ? I know some of them are not (e.g. Solaris has 64-bit counters for bytes in/out per interface).
exactly, and it is even more complicated than that . . . see below
I'll change it to a counter now, with MAX set to "unknown". The
overflow handling should still work correctly, if I understand the RRD docs right.
I would not recommend this. Another major issue is counter resets
instead of overflows (e.g reboot) get mistaken as wraps if the MAX is
not correct. From what I recall, if you use counter and anything
gets mistaken, you get a massive spike in the RRD making all the data
relatively useless because the y axis autoscales to the spike.
With DERIVE=0 you acknowledge you won't handle counter wraps
correctly (which are not that common anyway) but the result for all
wraps/resets are benign with the NaN, which does *not* cause a
spike. I am a firm believer in no data is better than bad data.
I am not opposing the ideal that COUNTER with correct max is the
'right way'. The problem with software that runs on so many
platforms is the correct max is impossible to know for certain.
Defining the MAX as just whatever 32/64 bits value is not adequate
because reboots will cause spikes, you'd need to now the MAX for the
particular metric and that is completely impossible to know
absolutely. inbytes MAX would need to be different for 10Mb/s 100
1000, Token Ring 16Mb/s, etc, etc.
DERIVE=0 and NaN is a much better compromise than the spikes. And I
would bet the farm reboots are a much more common event than counter
wraps for the majority of environments.
And Henrik, the net result to you will be answering an endless stream
of emails regarding why every COUNTER RRD has spikes . . . I've been
there, done that ;) I am almost 100% positive there is not *one*
counter RRD in the larrd stuff, all DERIVE. It's not impossible
rrdtool has changed to alleviate some of this, but from what I have
read of your email streams it I haven't seen anything to support that.
scott
Hi Scott,
On Sun, Jul 09, 2006 at 04:02:32PM -0400, Scott Walters wrote:
The question then becomes "what's a suitable max" for these data ? Should I assume they are 32-bit counters ? I know some of them are not (e.g. Solaris has 64-bit counters for bytes in/out per interface).
exactly, and it is even more complicated than that . . . see below [snip explanation] And Henrik, the net result to you will be answering an endless stream
of emails regarding why every COUNTER RRD has spikes . . . I've been
there, done that ;) I am almost 100% positive there is not *one*
counter RRD in the larrd stuff, all DERIVE. It's not impossible
rrdtool has changed to alleviate some of this, but from what I have
read of your email streams it I haven't seen anything to support that.
Your experience certainly carries a lot of weight. Since I've never used any COUNTER datasets I haven't seen this problem (you're right: all the LARRD DS definitions use DERIVE - I copied those just about verbatim into Hobbit).
So - I've undone the change. Back to DERIVE with MIN=0, and we'll see how much trouble that gives us. So far, only one person has noticed bad effects from this.
Thanks, Henrik
Le 9 juil. 06 à 22:23, Henrik Stoerner a écrit :
So - I've undone the change. Back to DERIVE with MIN=0, and we'll see how much trouble that gives us. So far, only one person has noticed bad effects from this.
Sorry to be this one ;) .
The choice is not easy. The better could be to have a specific
manual on this, with some option to have this rrd correctly set up.
Nicolas
participants (3)
-
henrik@hswn.dk
-
ndo@unikservice.com
-
scott@PacketPushers.com