Is there a limit on the number of hosts that can polled?
Once I start to get around 2700 hosts hobbit stops updating. Is there a cause for this in hobbit (memory or host limit)? Top shows my system with very little utilization of memory or cpu and all hosts go purple for no updates. I can get hobbit updating again by commenting out hosts and restarting it but I still have roughly another fifteen hundred hosts to enter. I have been adding the hosts in slowly in groups of less than two hundred which seems to work better as I was seeing everything go purple at under two thousand hosts before.
Jon Smith Network Support Technician Time Warner Cable
bbtest-net version 4.2.0 Statistics: Hosts total : 2672 Hosts with no tests : 19 Total test count : 2656 Status messages : 2657 Alert status msgs : 0 Transmissions : 28
TCP test statistics:
TCP tests total : 3
HTTP tests : 1
Simple TCP tests : 2
Connection attempts : 3
bytes written : 137
bytes read : 374
TIME SPENT Event Starttime Duration bbtest-net startup 1232031572.798610
Service definitions loaded 1232031572.804664 0.006054 Tests loaded 1232031606.460960 33.656296 DNS lookups completed 1232031606.547165 0.086205 Test engine setup completed 1232031606.571986 0.024821 TCP tests completed 1232031606.575635 0.003649 PING test completed (2653 hosts) 1232031711.407204 104.831569 PING test results sent 1232031711.435915 0.028711 Test result collection completed 1232031711.435934 0.000019 LDAP test engine setup completed 1232031711.435940 0.000006 LDAP tests executed 1232031711.435947 0.000007 LDAP tests result collection completed 1232031711.435953 0.000006 Test results transmitted 1232031711.575185 0.139232 bbtest-net completed 1232031711.578223 0.003038 TIME TOTAL 138.779613
P Go Green! Print this email only when necessary. Thank you for helping Time Warner Cable be environmentally responsible.
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
As nobody took a shot at this, While you are ok on memory and CPU - have you looked at your other resources? With that many hosts reporting back to a master - I would suspect I/O flooding off your interface...
Just a thought ....
Brian
lurch at inorbit.com -------Original Message-------
From: Smith, Jonathan Date: 1/15/2009 10:19:33 AM To: hobbit at hswn.dk Subject: [hobbit] Is there a limit on the number of hosts that can polled?
Once I start to get around 2700 hosts hobbit stops updating. Is there a cause for this in hobbit (memory or host limit)? Top shows my system with very little utilization of memory or cpu and all hosts go purple for no updates. I can get hobbit updating again by commenting out hosts and restarting it but I still have roughly another fifteen hundred hosts to enter. I have been adding the hosts in slowly in groups of less than two hundred which seems to work better as I was seeing everything go purple at under two thousand hosts before.
Jon Smith Network Support Technician Time Warner Cable
bbtest-net version 4.2.0 Statistics: Hosts total : 2672 Hosts with no tests : 19 Total test count : 2656 Status messages : 2657 Alert status msgs : 0 Transmissions : 28
TCP test statistics:
TCP tests total : 3
HTTP tests : 1
Simple TCP tests : 2
Connection attempts : 3
bytes written : 137
bytes read : 374
TIME SPENT Event Starttime Duration bbtest-net startup 1232031572.798610 - Service definitions loaded 1232031572.804664 0.006054
Tests loaded 1232031606.460960 33.656296
DNS lookups completed 1232031606.547165 0.086205
Test engine setup completed 1232031606.571986 0.024821
TCP tests completed 1232031606.575635 0.003649
PING test completed (2653 hosts) 1232031711.407204 104.831569
PING test results sent 1232031711.435915 0.028711
Test result collection completed 1232031711.435934 0.000019
LDAP test engine setup completed 1232031711.435940 0.000006
LDAP tests executed 1232031711.435947 0.000007
LDAP tests result collection completed 1232031711.435953 0.000006
Test results transmitted 1232031711.575185 0.139232
bbtest-net completed 1232031711.578223 0.003038
TIME TOTAL 138.779613
P Go Green! Print this email only when necessary. Thank you for helping Time Warner Cable be environmentally responsible.
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
You're definitely not at Hobbit's maximum as this user has twice the number of hosts!
http://en.wikibooks.org/wiki/The_hobbit_Users_list#Steria
Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
On Sat, Jan 17, 2009 at 7:54 PM, Brian Catlin <bcatlin at gmail.com> wrote:
As nobody took a shot at this, While you are ok on memory and CPU - have you looked at your other resources? With that many hosts reporting back to a master - I would suspect I/O flooding off your interface...
Just a thought ....
Brian
lurch at inorbit.com *-------Original Message-------*
*From:* Smith, Jonathan <jon.d.smith at twcable.com> *Date:* 1/15/2009 10:19:33 AM *To:* hobbit at hswn.dk *Subject:* [hobbit] Is there a limit on the number of hosts that can polled?
Once I start to get around 2700 hosts hobbit stops updating. Is there a cause for this in hobbit (memory or host limit)? Top shows my system with very little utilization of memory or cpu and all hosts go purple for no updates. I can get hobbit updating again by commenting out hosts and restarting it but I still have roughly another fifteen hundred hosts to enter. I have been adding the hosts in slowly in groups of less than two hundred which seems to work better as I was seeing everything go purple at under two thousand hosts before.
Jon Smith Network Support Technician Time Warner Cable
bbtest-net version 4.2.0 Statistics: Hosts total : 2672 Hosts with no tests : 19 Total test count : 2656 Status messages : 2657 Alert status msgs : 0 Transmissions : 28
TCP test statistics:
TCP tests total : 3
HTTP tests : 1
Simple TCP tests : 2
Connection attempts : 3
bytes written : 137
bytes read : 374
TIME SPENT Event Starttime Duration bbtest-net startup 1232031572.798610
Service definitions loaded 1232031572.804664 0.006054 Tests loaded 1232031606.460960 33.656296 DNS lookups completed 1232031606.547165 0.086205 Test engine setup completed 1232031606.571986 0.024821 TCP tests completed 1232031606.575635 0.003649 PING test completed (2653 hosts) 1232031711.407204 104.831569 PING test results sent 1232031711.435915 0.028711 Test result collection completed 1232031711.435934 0.000019 LDAP test engine setup completed 1232031711.435940 0.000006 LDAP tests executed 1232031711.435947 0.000007 LDAP tests result collection completed 1232031711.435953 0.000006 Test results transmitted 1232031711.575185 0.139232 bbtest-net completed 1232031711.578223 0.003038 TIME TOTAL 138.779613
P Go Green! Print this email only when necessary. Thank you for helping Time Warner Cable be environmentally responsible.
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
I would agree with this, the disk subsystem is probably unable to keep up with the I/O load. Use "iostat 30" or "vmstat 30" to determine iowait percentage, which is probably very high. To fix it, get rid of any raid5/6 (even if handled by a dedicated controller) or LVM, and possibly use faster disks. The best balance between performance and data redundancy is raid10, but obviously it costs more because there are more disks. For write-intensive tasks like this, even JBOD is a better performance option than raid5. Because I never use it, I don't really know why LVM causes problems, but I know from others' experience that it does.
The problem with raid5 and raid6 is that there's a write penalty due to
the need to calculate and write parity data. A good controller with
memory for write caching can mitigate this in many typical
circumstances, but only if the entire transaction can fit in the cache
memory and can be flushed to disk before another data flood comes in.
In this case, it takes about 2700 hosts to generate more data than the
system can write before more arrives.
Brian Catlin wrote:
As nobody took a shot at this, While you are ok on memory and CPU - have you looked at your other resources? With that many hosts reporting back to a master - I would suspect I/O flooding off your interface...
Just a thought ....
On Thu, Jan 15, 2009 at 10:16:11AM -0500, Smith, Jonathan wrote:
Once I start to get around 2700 hosts hobbit stops updating. Is there a cause for this in hobbit (memory or host limit)? Top shows my system with very little utilization of memory or cpu and all hosts go purple for no updates. I can get hobbit updating again by commenting out hosts and restarting it but I still have roughly another fifteen hundred hosts to enter.
Which tests go purple - the network tests ("conn" status, since you seem to be doing mostly ping tests), or all of them including the client-side tests (cpu, disk, memory etc.) ?
You shouldn't have any problems with that number of hosts.
You're nowhere near the number of hosts I have in my production setup; I have about 5700 entries in bb-hosts, my main network probe tests 4100 of them. And it seems your network tests complete well within the 300 second max. poll time. What does the "bbgen" status say about the time it takes to build the webpages ? And what's the I/O load on the Hobbit server - check out the "CPU utilization" graph in the "trends" column (NOT the "CPU load" one - you want the multi-color 'stacked' graph).
Are there any errors logged in the various Hobbit logfiles ? Or any "ressource" problems logged in the operating system logs - like, out of sockets, network card issues, or other weird messages?
And what operating system is this on ?
The limitations I've seen over time mostly have to do with the amount of disk I/O caused by the hobbitd_rrd RRD graph data collector (the update caching added in the current development version solves that problem completely), and with the network ressources used for testing lots of hosts - some systems have fairly small ARP caches, and this can cause all sorts of weird problems, because Hobbit will sporadically lose contact with itself or with the systems it is testing.
Regards, Henrik
participants (5)
-
bcatlin@gmail.com
-
elyograg@elyograg.org
-
henrik@hswn.dk
-
jon.d.smith@twcable.com
-
josh@imaginenetworksllc.com