On 1/25/08 1:43 PM, "Charles Jones" <jonescr at cisco.com> wrote:
I think Henriks stance on having the server collect data via ssh connections just doesn't scale. Sure it works fine for a few dozen hosts, but let's say you have 2000 servers...now you are expecting be able to make 2000 trouble-free ssh connections before the next polling cycle begins. This introduces many problems:
- How many ssh sessions can you run at the same time without spiking the load on the hobbit server? The latest revision is threaded. The thread count is a parameter and is easily changed. If I remember correctly, we had 11 threads finishing all the nodes in under 90 seconds. Also, going to a threaded architecture brought cpu util from 80% to 5%. I was astonished. So we figured you could have 100's of threads running on the HP rx1600 single socket node we were using for hobbit.
- What happens when an ssh session hangs (could hang the hobbit server, or make the poll cycle take too long) Since there were many threads, a hung thread would hardly be noticed if it weren't for the purple that node would turn.
You do know about the "pulldata" option? It allows the Hobbit server to do a "pull" instead of waiting for client "push". This works fairly well, and I am using it in a production environment. I can see how it would not scale to well either though, for a really large number of hosts.
To picture the scalability, imagine a server that only has to receive updates from hobbit clients. All it has to do is listen on port 1984, and using relatively little CPU it can probably handle a constant flow of client updates.
Now imagine a server that has to go and fetch the client data itself. There is a LOT more overhead and processing involved in launching an outgoing ssh connection, running a remote client data-gathering command, waiting for the output, etc. Imagine 2000 of those firing off every 5 minutes. How many simultaneous ssh sessions can your server handle? I've seen a server brought to its knees by a script that ran amok and was doing 50 simulataneous scp commands :) Some time saving is done by using msgcache (no waiting for the data-gathering), but there is still the overhead of ssh itself, and having key-based ssh ability could be deemed a security risk (anyone who hacks into the hobbit server could then ssh to all of your client machines without a password). I don't know how you secure your servers, but nobody is getting into my hobbit/hobcen servers with out authorization. Believe it or not, there are ways to prevent unauthorized access. The caveat here is that I don't put mine on a public IP. :-)
A good solution would be an ssl-encrypted, bi-directional protocol. This would allow secure transfer of client data, either push or pull, without the overhead, management, and security risks of using ssh.
In the meantime, definitely check out the pulldata+msgcache option, as it sounds like it will do what you want. I have not looked at the option you note, however, there are times where deploying clients is not an option. I suspect that is why bb-central was born and why I developed hobcen. Like I said, it started as a shell script, morphed into a C application and then a POSIX-threaded C application. This was all based on shared ssh keys, but after coming from a stint in a DC with 60,000+ nodes on 3 acres of raised floor, I learned very quickly how to use ssh pw auth for batch communications that is fast. :-)
We all have issues to resolve and like UNIX, there are 10 different ways to solve any one of them.
Cheers,
Tim
-Charles
Tim Rotunda wrote:
To answer Axel's what is it question.....its a Hobbit version of BB-Central, which runs on a central server like hobbit does. It reaches out to the clients via ssh (or whatever) and collects data. I did a shell script version a few years ago and it worked good until the client count topped 25-30. Then I migrated it to C and it would handle 60+ nodes pretty well. Then I migrated that to a multi-threaded C process and it really smoked. I never did reach the limit with that version. I think they are still using it and adding nodes to the client list, which is prob over 250 or so.
I was going to put it out to the community but my company would not allow it (idiots) so I couldn't. I now work only 40 hours a week so now I have some time to myself and was thinking about rewriting it from memory and putting it out there. I would put out the one that is threaded and it would prob just be for x86 Linux, which should build on Solaris, HP-UX, etc.
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.