ifstat numbers tracking 10Gb network performance incorrectly
Hi,
We have been using xymon as a great tool for tracking various performance metrics across our environment, and being able to drill into historical performance data as a way to understand overall system performance. One of the things we've had difficulty with is the performance stats returned by the "ifstat" module, and specifically on 10Gbit ethernet interfaces. On xymon clients with 10G interfaces, sometimes the recorded metrics get clobbered and recorded incorrectly. At times when we know the server network activity is relatively high (e.g. 3+ Gbits/sec) the xymon ifstat charts might show only 110Mbit/sec. And further, while we know the activity is not constant, the charts are often very "flat" and appear capped at that level. It looks like an artificial cap, like the stats are being clipped at that level. It is as if xymon is mishandling the activity metrics that the client is returning. Or maybe the RRD modules that xymon uses?
We are using xymon's default DERIVE types for the ifstat metrics, bytesSent and bytesReceived. I have traced the obytes64 and rbytes64 values returned by the client and they look sane, and match what other network tools on the client are telling us. But those values are are getting mangled somewhere between there and the graphs. Dumping the rrd files I can see the "PDP" (latest update) values there are correct (e.g. ~2e+09), but the "CDP" (5-min average) values are not (e.g. ~1.4e+07). How can that be? Why would RRD do that?
Mysteriously, every so often the graphs seem to spring to life, and show reasonable values up above 1Gbit. And sometimes it happens around the time when we're tinkering with one network element or another. But after a while (~minutes to hours), they just as mysteriously revert back to the bogus "capped" values. (the RRD's CDP values reflect this as well.) Most aggravating, as it leaves us unable to believe these charts. So the rrd is _sometimes_ showing good data, but often not.
We are working through the process of updating everything (xymon, rrdtool) to latest versions, but wondering if this might be a problem in our configuration that software updates alone won't fix
Is there some well-worn advice out there about how to configure xymon to properly gather/store/chart network performance stats for 10Gbit networks, specifically the "ifstat" module?
Thanks in advance for any tips, pointers, ideas...
Steve Groom sgroom at ipac.caltech.edu
participants (1)
-
sgroom@ipac.caltech.edu