TCP/IP stats (bits/s) limited to 100M

older
[patch] bb-findhost.cgi redirects...

ndo＠unikservice.com

28 Jun 2006 28 Jun '06

8:20 a.m.

Hi,

I have data suspicious with TCP/IP stats on Solaris and AIX.

Graphs told me that my hosts don't send/receive more than ~110 Mbits/

s . This raise an alert on the backup server which is installed with
a GigEth interface and need to eat many backups flow simulteanously.

So, I'm checking other GigEth equipped host.
I have some solaris hosts with GigEth. Graphs are inconsistent

between interfaces details and TCP/IP general graphs. Take a look :

http://www.unikservice.com/frp/tcpip.png
http://www.unikservice.com/frp/i1.png
http://www.unikservice.com/frp/i2.png

So, I'm suspecting an issue with collect or graphs.  Could somebody

tell me where I should start to debug ?

Nicolas

Show replies by date

olivier.beau＠telecomitalia.fr

28 Jun 28 Jun

9:42 a.m.

New subject: [hobbit] TCP/IP stats (bits/s) limited to 100M

Hi,

this looks like tcp-data going arround the 32bit counter problem... are your counters 32 bit ? could you give us a copy of them ?

olivier

-----Message d'origine----- De : Nicolas Dorfsman [mailto:ndo at unikservice.com] Envoyé : mercredi 28 juin 2006 10:20 À : hobbit at hswn.dk Objet : [hobbit] TCP/IP stats (bits/s) limited to 100M

Hi,

I have data suspicious with TCP/IP stats on Solaris and AIX.

Graphs told me that my hosts don't send/receive more than ~110 Mbits/

s . This raise an alert on the backup server which is installed with
a GigEth interface and need to eat many backups flow simulteanously.

So, I'm checking other GigEth equipped host.
I have some solaris hosts with GigEth. Graphs are inconsistent

between interfaces details and TCP/IP general graphs. Take a look :

http://www.unikservice.com/frp/tcpip.png
http://www.unikservice.com/frp/i1.png
http://www.unikservice.com/frp/i2.png

So, I'm suspecting an issue with collect or graphs.  Could somebody

tell me where I should start to debug ?

Nicolas

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

ndo＠unikservice.com

9:54 a.m.

New subject: [hobbit] TCP/IP stats (bits/s) limited to 100M

Le 28 juin 06 à 11:42, Beau Olivier a écrit :

...

Hi,

this looks like tcp-data going arround the 32bit counter problem... are your counters 32 bit ? could you give us a copy of them ?

I'd be glad to.

Which counter is used by hobbit ?

Nicolas

henrik＠hswn.dk

11:09 a.m.

New subject: [hobbit] TCP/IP stats (bits/s) limited to 100M

On Wed, Jun 28, 2006 at 11:42:14AM +0200, Beau Olivier wrote:

...

Hi,

this looks like tcp-data going arround the 32bit counter problem... are your counters 32 bit ? could you give us a copy of them ?

The RRD files are created as "DERIVE" datatypes with a minimum value of 0, which should handle 32/64-bit counter overflows automatically. (See the rrdcreate man-page).

Note that Hobbit never does any calculations on these values, it passes them directly (as strings) to the RRDtool functions.

Regards, Henrik

wxxx333＠gmail.com

1:42 p.m.

New subject: [hobbit] TCP/IP stats (bits/s) limited to 100M

Hi,

The 110Mbits/s value you get, does really point to 32bit counter

wrap, because with 32bit BYTE counter, measured every 5 minutes, 110Mbits/s is (aprox) the maximum you can count without wrapping the counter.

As Henrik explained bellow, it should not be a "wrap" done by

hobbit nor RRD. I'm think you need to look at you OS counters directly and see if they're wrapping in less than 5 minutes.

If you on Solaris / SunOS you could use something like the

bellow, and watch if any of the counters wraps 32 bit value (4294967295 if i recall correctly)

(host)$> while [ 1 ]; do date; netstat -s |
egrep "(tcpInInorderBytes|tcpOutDataBytes)" ; sleep 60; done

Example output: Wed Jun 28 09:28:28 EDT 2006 tcpOutDataSegs =5864959 tcpOutDataBytes =4273878800 tcpInInorderSegs =2670997 tcpInInorderBytes =725993348

(CARE) With RRD it's possible to come around of this OS

limitation by feeding the data in shorter times, lets say every 2 minutes. RRD will take care of computing (making) the values "correct" for the steep size used to create the RRD (in hobbit's case 300 secs).

I'm not exactly sure how /and or if hobbit will be happy in

receiving client info quicker than every 5 minutes, but i think it should be transparent.

Hope this can give you some help.

Regards
Werner

----------------------- Original Message ----------------------- From: henrik at hswn.dk (Henrik Stoerner) To: hobbit at hswn.dk Date: Wed, 28 Jun 2006 13:09:13 +0200 Subject: Re: [hobbit] TCP/IP stats (bits/s) limited to 100M

...

On Wed, Jun 28, 2006 at 11:42:14AM +0200, Beau Olivier wrote:

...
Hi,

this looks like tcp-data going arround the 32bit counter problem... are your counters 32 bit ? could you give us a copy of them ?

The RRD files are created as "DERIVE" datatypes with a minimum value of 0, which should handle 32/64-bit counter overflows automatically. (See the rrdcreate man-page).

Note that Hobbit never does any calculations on these values, it passes them directly (as strings) to the RRDtool functions.

Regards, Henrik

ndo＠unikservice.com

2:07 p.m.

New subject: [hobbit] TCP/IP stats (bits/s) limited to 100M

Le 28 juin 06 à 15:42, Werner (gmail Lists) a écrit :

...

Hi,

The 110Mbits/s value you get, does really point to 32bit counter wrap, because with 32bit BYTE counter, measured every 5 minutes, 110Mbits/s is (aprox) the maximum you can count without wrapping the counter.

As Henrik explained bellow, it should not be a "wrap" done by hobbit nor RRD. I'm think you need to look at you OS counters directly and see if they're wrapping in less than 5 minutes.

If you on Solaris / SunOS you could use something like the bellow, and watch if any of the counters wraps 32 bit value
(4294967295 if i recall correctly)

Correct. Found this document which approves what you're saying :

http://sunsolve.sun.com/search/document.do?assetkey=1-25-72535-1

Le 28 juin 06 à 13:09, Henrik Stoerner a écrit :

...

On Wed, Jun 28, 2006 at 11:42:14AM +0200, Beau Olivier wrote:

...
Hi,

this looks like tcp-data going arround the 32bit counter problem... are your counters 32 bit ? could you give us a copy of them ?

The RRD files are created as "DERIVE" datatypes with a minimum
value of 0, which should handle 32/64-bit counter overflows automatically. (See the rrdcreate man-page).

Well...the man is not so confident :

          COUNTER
              is for continuous incrementing counters like the
              ifInOctets counter in a router. The COUNTER data
              source assumes that the counter never decreases,
              except when a counter overflows.  The update
              function takes the overflow into account.  The
              counter is stored as a per-second rate. When the
              counter overflows, RRDtool checks if the
              overflow happened at the 32bit or 64bit border
              and acts accordingly by adding an appropriate
              value to the result.

          DERIVE
              will store the derivative of the line going from
              the last to the current value of the data
              source. This can be useful for gauges, for
              example, to measure the rate of people entering
              or leaving a room. Internally, derive works
              exactly like COUNTER but without overflow
              checks. So if your counter does not reset at 32
              or 64 bit you might want to use DERIVE and
              combine it with a MIN value of 0.

              NOTE on COUNTER vs DERIVE
                  by Don Baarda &lt;don.baarda at baesystems.com>

                  If you cannot tolerate ever mistaking the
                  occasional counter reset for a legitimate
                  counter wrap, and would prefer "Unknowns"
                  for all legitimate counter wraps and resets,
                  always use DERIVE with min=0. Otherwise,
                  using COUNTER with a suitable max will
                  return correct values for all legitimate
                  counter wraps, mark some counter resets as
                  "Unknown", but can mistake some counter
                  resets for a legitimate counter wrap.

                  For a 5 minute step and 32-bit counter, the
                  probability of mistaking a counter reset for
                  a legitimate wrap is arguably about 0.8% per
                  1Mbps of maximum bandwidth. Note that this
                  equates to 80% for 100Mbps interfaces, so
                  for high bandwidth interfaces and a 32bit
                  counter, DERIVE with min=0 is probably
                  preferable. If you are using a 64bit
                  counter, just about any max setting will
                  eliminate the possibility of mistaking a
                  reset for a counter wrap.

In my particular case (and maybe in any large GigEth flow) COUNTER
with max set to 4294967295 should be the solution

Le 28 juin 06 à 15:42, Werner (gmail Lists) a écrit :

...

(CARE) With RRD it's possible to come around of this OS limitation by feeding the data in shorter times, lets say every 2 minutes. RRD will take care of computing (making) the values "correct" for the steep size used to create the RRD (in hobbit's case 300 secs).

I'm not exactly sure how /and or if hobbit will be happy in receiving client info quicker than every 5 minutes, but i think it should be transparent.

Mmmm. I'd prefer to try to fix the RRD file. May be tricky (export,
import, etc), but more reliable.

...

Hope this can give you some help.

it definitively helps, thanks !

Nicolas

henrik＠hswn.dk

10:35 a.m.

New subject: [hobbit] TCP/IP stats (bits/s) limited to 100M

On Wed, Jun 28, 2006 at 10:20:11AM +0200, Nicolas Dorfsman wrote:

...

Graphs told me that my hosts don't send/receive more than ~110 Mbits/ s . This raise an alert on the backup server which is installed with a GigEth interface and need to eat many backups flow simulteanously. So, I'm checking other GigEth equipped host. I have some solaris hosts with GigEth. Graphs are inconsistent
between interfaces details and TCP/IP general graphs. Take a look :

http://www.unikservice.com/frp/tcpip.png http://www.unikservice.com/frp/i1.png http://www.unikservice.com/frp/i2.png

So, I'm suspecting an issue with collect or graphs. Could somebody
tell me where I should start to debug ?

Let me explain where these data come from.

The first graph ("TCP/IP statistics") are fed by data from the "netstat -s" command. This is (from a Solaris host):

TCP tcpRtoAlgorithm = 4 tcpRtoMin = 400 <snip> tcpCurrEstab = 0 tcpOutSegs =51380214 tcpOutDataSegs =17936799 tcpOutDataBytes =4114388778 <more snip> tcpInSegs =59097243 tcpInAckSegs =19928198 tcpInAckBytes =4108598170 tcpInDupAck =9794396 tcpInAckUnsent = 0 tcpInInorderSegs =34384580 tcpInInorderBytes =1273412387 tcpInUnorderSegs =970394 tcpInUnorderBytes =694993056 tcpInDupSegs = 70767 tcpInDupBytes =20764736

Hobbit tracks the "tcpOutDataBytes" and "tcpInInorderBytes" for the first graph. These are fed into an RRD file which computes the difference between two measurements, and from that it computes an average number of bytes sent over a 5 minute period. For the graph, this is then multiplied by 8 to go from bytes/second to bits/second.

What this means is that Hobbit does not count UDP traffic or other non-TCP traffic in this graph. If you have lots of streaming data which typically uses UDP, this can be a significant amount of data.

Also, it doesn't count out-of-order packets (retransmits, duplicate packets - see your OS documentation to learn exactly what goes into the "tcpInUnorderBytes" counter).

The second graph is fed by data from the Solaris' "kstat" utility, or AIX's "netstat -v" output. As far I understand, this counts raw Ethernet packet bytes - i.e. all protocols. They are fed into RRD files just like the TCP statistics.

So - most likely the difference is in what protocols are counted for each of the graphs.

Regards, Henrik

7301

Age (days ago)

7301

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

henrik＠hswn.dk
ndo＠unikservice.com
olivier.beau＠telecomitalia.fr
wxxx333＠gmail.com