Daniel J McDonald wrote:
But having one or two servers that poll all of the others does scale well, because you don't have to install (and upgrade) hobbit clients on a hundred machines - just set up an rsa key and you are done. If the primary hobbit display/alarm/parse work is too much with the polling added, just use a second hobbit server for polling/parsing and feed the results to the display server...
I disagree. The distributed system scales much better, as the remote servers are sending in their results in parallel. While fping is able to test remote hosts in parallel, the other test are done in serial (bb-fetch, etc).
Lets say you have 1000 hosts. Lets then just for fun pretend that it will only take 1 second to log into the remote hosts, run several tests, and receive the result (it would actually take a bit longer than that).
1000 seconds (hosts) / 60 (minutes) = 16.666 minutes to poll those hosts!
So then you can say oh well just have 2-3 hobbit servers doing the polling then. Now you have 3 hobbit servers to deail with, monitoring them, upgrading them, etc.
Now lets look at a *real world* example of how long it takes to ssh in and execute a command:
[hobbit at hobbit ~]$ time ssh myhost.net df -h Filesystem size used avail capacity Mounted on /dev/dsk/c5t0d0s0 30G 10G 20G 35% / /devices 0K 0K 0K 0% /devices ctfs 0K 0K 0K 0% /system/contract proc 0K 0K 0K 0% /proc mnttab 0K 0K 0K 0% /etc/mnttab swap 6.6G 1000K 6.6G 1% /etc/svc/volatile objfs 0K 0K 0K 0% /system/object fd 0K 0K 0K 0% /dev/fd swap 6.6G 16K 6.6G 1% /tmp swap 6.6G 32K 6.6G 1% /var/run /dev/md/dsk/d0 639G 116G 523G 19% /raid /dev/md/dsk/d1 807G 504G 304G 63% /raid2
*real 0m1.912s* user 0m0.022s sys 0m0.008s
Almost 2 seconds there....and just for one command. So now even 2 hobbit servers polling simultaneously will still take over 15 minutes just to poll 1000 servers. Having hobbit do the ssh's in parallel wouldn't work either, I have tried something similar on far fewer hosts, and even using -c blowfish option the server CPU still hit 100% from all the overhead.
The way that I get around this is to have bbproxy running on a DMZ host, and have the hobbit/bb clients configured to use the bbproxy IP as their BBDISPLAY, whcih then forwards the traffic out of the DMZ to my hobbit server. Not 100% secure, but using bb-fetch isn't either (an attacker could compromise one of the remote servers, and modify one of the commands that the hobbit user executes, thus giving them the ability to communicate with the hobbit server, injecting something to break the parsing engine, buffer overflows, etc). I will stop talking about that now as I am getting off subject :)
I agree that having similar functionality to bb-fetch could be useful for a *few* remote/DMZ hosts, but it certainly doesn't scale well. Once you reach a number of hosts whose polling time exceeds the hobbit refresh interval you are done. I know it would be "nice" if we didn't have to upgrade remote clients and maintain them, but your solution involves ssh keys, so just use those same keys and a script to roll out the updates :)
-Charles