On Thu, Jun 09, 2005 at 07:41:42AM +0200, Henrik Stoerner wrote:
It seems version 4.0 has reached some stability - there are still a few odd bug reports, but nothing that looks like major problems. So I thought it would be worthwhile to let you know what my plans are.
Well, lots of responses so I'll try to pick out some trends and respond to them in one place.
The Hobbit client
Adam Goryachev :
Personally, I'd most like to see a 'free' client (ie, GPL, without the BB license issue)
Right now the vote seems to be in favour of working on the client. It certainly will be GPL.
Craig Cook :
Instead of writing a hobbit client from scratch, it may be worth looking at modifying the Nagios client.
That might be an idea, yes. I haven't spent much time with Nagios, so I am not really familiar with their architecture. I had the impression it was more SNMP focused than BB/Hobbit.
SNMP - or "How to collect data"
Adam Goryachev :
I'd also like to see *much* better SNMP support.
Craig Cook :
Building SNMP support into core hobbit would be a good idea, it is also non trivial.
Daniel J McDonald :
I'd really like the bb-central approach. Most of the status information can be grabbed from non-privileged accounts on all unix-like platforms. I concede that a client is necessary in the windows world.
There's no doubt that some sort of support in Hobbit for collecting data via SNMP would be very useful. However, I believe that's better implemented as a stand-alone tool, somewhat like the network tester. It would obviously rely on some library like Net-SNMP 5 for the dirty stuff of talking SNMP (meaning it would support SNMP v1, v2c and v3 automatically - although I have at one point implemented SNMP daemons and MIB instrumentation from scratch, I'd rather not repeat that excercise :-))
However, I don't want to base a Hobbit client on SNMP or any other central polling-style method of data collection. There are at least two reasons for that:
It doesn't scale well. My main setup has over 2000 boxes to monitor. Doing that from one central server would mean polling 7 systems per second - that just won't work. There are always some servers that are down causing timeouts... whether it happens via SNMP, ssh or some other protocol really doesn't matter. It probably works fine for a setup with 50 or even 100 systems, but not for me.
The central server needs to know about all kinds of systems If my central polling server runs an "ssh hobbit at clientIP uptime;df;ps" - then the central server must know how to interpret the output. That's one of the major design problems in Hobbit currently - everytime Redhat^Wsomeone comes up with a new layout for the "vmstat" output, Hobbit needs to be modified to recognize these data.
So my idea currently is to design a new type of client. It won't generate "status" messages, it will just collect data. Imagine a client that just sends Hobbit a "client data" package, like
os: Linux osver: 2.6.11 osid: Debian/Sarge i386 uptime: 173201 seconds loadaverage: 0.4 filesystem /: 26102 MB, 71% used inodes /: 10291029 total, 21% used filesystem /var: 102400 MB, 50% used inodes /var: 40182910 total, 7% used
This is one well-defined format that Hobbit needs to recognize, and based on these data it can match e.g. filesystem utilisation against a configuration file on the Hobbit server and generate the necessary status messages - so the end result will look exactly like what you have today, but with much less complexity in how e.g. the RRD handlers need to know about the types of systems that report into Hobbit.
The only drawback is that the client becomes slightly more advanced but not much; it's really just formatting the information differently before sending it off to the Hobbit server.
Another very nice thing about this is that you can easily (well, relatively) write Hobbit modules that handle new kinds of information.
And it can be done without breaking compatibility with the existing clients, so you can run a mix of BB and Hobbit clients without any problems.
Encryption/authentication and compression of status messages
Adam Goryachev :
Finally, what about some sort of compression/encryption protocol, so that it is possible to do more frequent test/report without using so much bandwidth?
Daniel J McDonald :
If we are building an extended protocol, we should support authentication as well.
There's already some IP-based access controls built into Hobbit; see the hobbitd man-page for the --status-senders, --maint-senders, --www-senders, --admin-senders options. The first one should be sufficient to block most attempts at sending fake status messages - an attacker would need to break into your network test server and send the fake messages from there.
However, authentication could be nice. I am tempted to handle both of these problems with one solution - and just implement an SSL-encrypted protocol where you can then use client-side certificates for authentication. That will be significant overhead on the processing side, but the good thing is that you can offload SSL to hardware devices fairly easy (and OpenSSL does support that kind of hardware).
Compression ... is it necessary ? All of the status messages in my setup combined are about 6 MB for 2000 servers - ie. 3 KB/server which gets updated every 300 secs (on average). So that's 10 bytes/second per server. So a rough bandwidth estimate for Hobbit would be 100 bps per server monitored. For a LAN, that's peanuts.
Well, thanks for the feedback - it's really good to learn what ideas and problems are the important ones.
Regards, Henrik