Mysterious Sawtooth Graphs
Hi
I use Hobbit to monitor about 700 systems. I get some mysterious looking graphs with the CONN test and also the bbgen test itself. It looks like two overlayed sawtooth curves. Any idea why the graphs look so weird? I cannot believe these are the real response times.
Here are two demo pics:
http://www.trektech.de/test/hobbitgraph_conn.png http://www.trektech.de/test/hobbitgraph_bbtest.png
BTW.: is there a way to speed up the connect test. It needs about 35sec which is not critical but not very fast.
Thorsten Erdmann ITI/EP68 Mercedes Benz Werk Hamburg Tel.: +49-40-7920-2593 mobil: +49-160-8614383 Lotus-Fax:+49-711-1779043874 mail: thorsten.erdmann at daimler.com
If you are not the intended addressee, please inform us immediately that you have received this e-mail in error, and delete it. We thank you for your cooperation.
We get them as well. Not sure why.
On Wed, Aug 12, 2009 at 7:21 AM, <thorsten.erdmann at daimler.com> wrote:
Hi
I use Hobbit to monitor about 700 systems. I get some mysterious looking graphs with the CONN test and also the bbgen test itself. It looks like two overlayed sawtooth curves. Any idea why the graphs look so weird? I cannot believe these are the real response times.
Here are two demo pics:
http://www.trektech.de/test/hobbitgraph_conn.png http://www.trektech.de/test/hobbitgraph_bbtest.png
BTW.: is there a way to speed up the connect test. It needs about 35sec which is not critical but not very fast.
Thorsten Erdmann ITI/EP68 Mercedes Benz Werk Hamburg Tel.: +49-40-7920-2593 mobil: +49-160-8614383 Lotus-Fax:+49-711-1779043874 mail: thorsten.erdmann at daimler.com
If you are not the intended addressee, please inform us immediately that you have received this e-mail in error, and delete it. We thank you for your cooperation.
-- Stewart
An infinite number of mathematicians walk into a bar. The first one orders a beer. The second orders half a beer. The third, a quarter of a beer. The bartender says "You're all idiots", and pours two beers.
thorsten.erdmann at daimler.com wrote:
Hi
I use Hobbit to monitor about 700 systems. I get some mysterious looking graphs with the CONN test and also the bbgen test itself. It looks like two overlayed sawtooth curves. Any idea why the graphs look so weird? I cannot believe these are the real response times.
Here are two demo pics:
http://www.trektech.de/test/hobbitgraph_conn.png http://www.trektech.de/test/hobbitgraph_bbtest.png
BTW.: is there a way to speed up the connect test. It needs about 35sec which is not critical but not very fast.
I've seen them too. Even on HTTP test graphs. And also not entirely sure why.
As far as the ping bounces around 20-40ms are concerned, this is a problem with hobbitping, and its polling algorithm. If you want the timing correct, I highly suggest you switch to fping. Install fping and change the FPING setting in hobbitserver.cfg: FPING="/usr/sbin/fping"
Note: I've used the following on mine with good luck, for several thousand machines: FPING="/usr/sbin/fping -i10 -t1500 -r2"
Note you must make sure the xymon or hobbit user has rights to run fping, either by a sudo arrangement or by setting up a setuid capability such as: chmod g+x /usr/sbin/fping chgrp xymon /usr/sbin/fping chmod u+s /usr/sbin/fping
Not sure about your statement regarding connect tests running 35 seconds. You mean the ping or tcp test times listed in the bbtest info page? Or the test setup or DNS resolve times? For 700 hosts, 35 seconds isn't too bad, I've seen 4000 hosts or so run in maybe 110-120 seconds. -Alan
I've seen them too. Even on HTTP test graphs. And also not entirely sure why.
As far as the ping bounces around 20-40ms are concerned, this is a problem with hobbitping, and its polling algorithm. If you want the timing correct, I highly suggest you switch to fping. Install fping and change the FPING setting in hobbitserver.cfg: FPING="/usr/sbin/fping" I will try that out. I remember I had problems using fping and switched to hobbitping therefore.
Note you must make sure the xymon or hobbit user has rights to run fping, either by a sudo arrangement or by setting up a setuid capability such as: chmod g+x /usr/sbin/fping chgrp xymon /usr/sbin/fping chmod u+s /usr/sbin/fping Maybe that was my fping problem. :-)
Not sure about your statement regarding connect tests running 35 seconds. You mean the ping or tcp test times listed in the bbtest info page? I mean the Ping test times in the bbtest page:
TIME SPENT Event Starttime Duration bbtest-net startup 1250148155.074249 - Service definitions loaded 1250148155.075870 0.001621 Tests loaded 1250148155.103019 0.027149 DNS lookups completed 1250148155.105950 0.002931 Test engine setup completed 1250148155.109315 0.003365 TCP tests completed 1250148155.110463 0.001148 PING test completed (651 hosts) 1250148187.146047 32.035584 <--- This one PING test results sent 1250148187.283501 0.137454 Test result collection completed 1250148187.283511 0.000010 LDAP test engine setup completed 1250148187.283513 0.000002 LDAP tests executed 1250148187.283522 0.000009 LDAP tests result collection completed 1250148187.283523 0.000001 NSLOOKUP tests executed 1250148187.287590 0.004067 Test results transmitted 1250148187.360348 0.072758 bbtest-net completed 1250148187.362335 0.001987 TIME TOTAL 32.288086
Bye Thorsten
If you are not the intended addressee, please inform us immediately that you have received this e-mail in error, and delete it. We thank you for your cooperation.
For what it's worth I've been seeing them too - I thought it was an oddity of our local network.
Joe
thorsten.erdmann at daimler.com wrote:
Hi
I use Hobbit to monitor about 700 systems. I get some mysterious looking graphs with the CONN test and also the bbgen test itself. It looks like two overlayed sawtooth curves. Any idea why the graphs look so weird? I cannot believe these are the real response times.
Here are two demo pics:
http://www.trektech.de/test/hobbitgraph_conn.png http://www.trektech.de/test/hobbitgraph_bbtest.png
BTW.: is there a way to speed up the connect test. It needs about 35sec which is not critical but not very fast.
Thorsten Erdmann ITI/EP68 Mercedes Benz Werk Hamburg Tel.: +49-40-7920-2593 mobil: +49-160-8614383 Lotus-Fax:+49-711-1779043874 mail: thorsten.erdmann at daimler.com
If you are not the intended addressee, please inform us immediately that you have received this e-mail in error, and delete it. We thank you for your cooperation.
I see then very often with less then 1ms pings.
On 8/12/09, Joe <joe at tmsusa.com> wrote:
For what it's worth I've been seeing them too - I thought it was an oddity of our local network.
Joe
thorsten.erdmann at daimler.com wrote:
Hi
I use Hobbit to monitor about 700 systems. I get some mysterious looking graphs with the CONN test and also the bbgen test itself. It looks like two overlayed sawtooth curves. Any idea why the graphs look so weird? I cannot believe these are the real response times.
Here are two demo pics:
http://www.trektech.de/test/hobbitgraph_conn.png http://www.trektech.de/test/hobbitgraph_bbtest.png
BTW.: is there a way to speed up the connect test. It needs about 35sec which is not critical but not very fast.
Thorsten Erdmann ITI/EP68 Mercedes Benz Werk Hamburg Tel.: +49-40-7920-2593 mobil: +49-160-8614383 Lotus-Fax:+49-711-1779043874 mail: thorsten.erdmann at daimler.com
If you are not the intended addressee, please inform us immediately that you have received this e-mail in error, and delete it. We thank you for your cooperation.
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
"When you have eliminated the impossible, that which remains, however improbable, must be the truth." --- Sir Arthur Conan Doyle
Just to add to the other confirmations...
We've noticed this behaviour too (also using fping) after moving to 4.3.0.0 beta2 (clean install).
We don't have very many hosts at the moment, but most of our hosts are multi-homed. The monitoring server has a separate network link for each VLAN/network, so no routing is involved for each connection test.
We thought the problem may be one or two faulty nic's on the monitoring server, but we get the 'sawtooth' behaviour with some systems but not others within the same VLAN/network.
So far we haven't found a solution though.
Regards, Mario
thorsten.erdmann at daimler.com writes:
Hi
I use Hobbit to monitor about 700 systems. I get some mysterious looking graphs with the CONN test and also the bbgen test itself. It looks like two overlayed sawtooth curves. Any idea why the graphs look so weird? I cannot believe these are the real response times.
Here are two demo pics:
http://www.trektech.de/test/hobbitgraph_conn.png http://www.trektech.de/test/hobbitgraph_bbtest.png
BTW.: is there a way to speed up the connect test. It needs about 35sec which is not critical but not very fast.
Thorsten Erdmann ITI/EP68 Mercedes Benz Werk Hamburg Tel.: +49-40-7920-2593 mobil: +49-160-8614383 Lotus-Fax:+49-711-1779043874 mail: thorsten.erdmann at daimler.com
If you are not the intended addressee, please inform us immediately that you have received this e-mail in error, and delete it. We thank you for your cooperation.
On Wednesday, 12 August 2009 12:21:16 thorsten.erdmann at daimler.com wrote:
Hi
I use Hobbit to monitor about 700 systems. I get some mysterious looking graphs with the CONN test and also the bbgen test itself. It looks like two overlayed sawtooth curves. Any idea why the graphs look so weird? I cannot believe these are the real response times.
Are you using hobbitping, or fping? If you are using hobbitping, that would explain your high ping times (20 ms+). You should install fping, and set the FPING variable in hobbitserver.cfg to a name that will find fping (either full path, or just name if it is in the path).
Here are two demo pics:
http://www.trektech.de/test/hobbitgraph_conn.png http://www.trektech.de/test/hobbitgraph_bbtest.png
BTW.: is there a way to speed up the connect test. It needs about 35sec which is not critical but not very fast.
How many devices are you running network tests, and at what intervals do you want the network tests to be run? I've usually only run network tests at intervals of 1min, higher frequency doesn't hold much benefit IMHO.
Regards, Buchan
participants (7)
-
asparks@doublesparks.net
-
bgmilne@staff.telkomsa.net
-
joe@tmsusa.com
-
josh@imaginenetworksllc.com
-
mv652@softhome.net
-
stewartl42@gmail.com
-
thorsten.erdmann@daimler.com