Hello
I'm investigating why bbtest-net runs are taking what I think is a long time on an xymon server. The bbtest-net status displayed on the webpage looks like so. Statistics: Hosts total : 2832 Hosts with no tests : 966 Total test count : 2730 Status messages : 2954 Alert status msgs : 0 Transmissions : 30
DNS statistics:
hostnames resolved : 1108
succesful : 2004
failed : 38
calls to dnsresolve : 1116
TCP test statistics:
TCP tests total : 773
HTTP tests : 327
Simple TCP tests : 446
Connection attempts : 772
bytes written : 48486
bytes read : 3283786
Error output: [Edited for brevity] 27 lines of this nature ... Host foo appears twice in bb-hosts! This may cause strange results
dnsresolve - internal error, name 'ldnpgec01v' not in cache
17 lines of this nature ... bbtest-net: Cannot resolve IP for host bar
TIME SPENT Event Starttime Duration bbtest-net startup 9601848.473961 - Service definitions loaded 9601848.481606 0.007644 Tests loaded, hostname lookups done 9601860.506724 12.025118 Test engine setup completed 9601860.519996 0.013271 TCP tests completed 9601875.790148 15.270152 PING test completed (1896 hosts) 9601945.698899 69.908750 PING test results sent 9601950.407327 4.708427 Test result collection completed 9601950.407684 0.000357 LDAP test engine setup completed 9601950.407685 0.000001 LDAP tests executed 9601950.407686 0.000001 LDAP tests result collection completed 9601950.407687 0.000000 DNS tests executed 9601951.952093 1.544405 NTP tests executed 9601963.812322 11.860229 Test results transmitted 9601963.881021 0.068698 bbtest-net completed 9601964.053873 0.172851 TIME TOTAL 115.579911
The lines in error output are the result of configuration put in by another team. I can correct it by changing some scripts, but I don't think that is causing the problem I'm looking into.
The bit I'm interested in is "PING test completed (1896 hosts)" and why it takes 70 seconds.
bbtest-net is set to run in hobbitlaunch.cfg like so. CMD bbtest-net --report --ping --checkresponse --concurrency=512 --no-ares
If I run it manually from the command line like so, the ping tests complete in less than a millisecond. /bbtest-net --no-ares --report --ping --checkresponse --concurrency=512 --no-update -debug PING test completed (1896 hosts) 9600209.001178 0.000537
The reason I'm investigating is we often get a flurry of false connectivity alerts from xymon when the time taken to run bbtest-net spikes for some reason.
There is another (contingency) host I have that I am doing some tests on. When I run bbtest-net manually on this host, the ping tests take about 40s. On the contingency host, in the --debug output there isn't an explanation for the time taken, the timestamps go straight from "TCP tests completed" to 40s later. 2014-02-28 11:05:51 TCP tests completed normally 2014-02-28 11:06:33 More than one ping result for 192.168.180.94 I have many of these "more than one ping result for <IP>" could they be contributing? I was assuming not.
Does anyone have any pointers for things I could look at or test? The bb-network.log doesn't have anything other than the error output previously mentioned.
On an unrelated note, a big thanks to everyone for participating on these lists (particularly Henrik for supporting and providing a great product). It has been a great help to me over the years.
Regards, Ross