On Thu, Dec 07, 2006 at 11:59:30AM -0800, Dan Simoes wrote:
I love hobbit and have been using it (and BB) for many years, so take this as constructive criticism.
One of my biggest headaches with BB (and now hobbit) has been the all-or-nothing nature of alerts. By this I mean that if your main network link is down, everything goes red for network status.
Something happened on my monitoring box (probably DNS) that caused a cadence of http errors. http was not truly down on all these N hosts on various networks, it was the network test that was failing on the monitoring box.
It's a valid point - but it is also very, very difficult to handle. Not so much because it is difficult to suppress alerts; the $1bn question is how to decide when to suppress an alert, and which issue is the root cause of all the problems we're seeing.
Heck, sometimes it can be difficult even for intelligent humans to figure out what is really going on ...
I think what this really boils down to is some form of event correlation mechanism, on top of which you then apply some heuristics (that's a fancy word for "guessing") to decide what is the core issue. E.g. if we have 200 tests reporting a failure because of a DNS lookup that timed out, then we probably have an issue with the DNS server we used. But it could also be a firewall mis-configuration that blocks our outbound DNS queries, or an IP address conflict that causes our DNS lookups to go to a server which doesn't handle DNS - it is really hard for any machine to figure that out by itself.
The current implementation is not ideal, I'll be the first to admit that. Any ideas for improving it are welcome, but please consider the possibilities for the system making wrong decisions. I'd rather send out one alert too many than one too few.
I'm unaware of a solution to this issue, and I'm considering moving to another product because of it.
If you know of any products that are really good at handling this, I'd be interested to hear about them.
Lastly, who is maintaining the debian package for hobbit? Both the server and client packages still have the same bugs I reported months ago.
Since there haven't been any Hobbit releases since August, that really shouldn't come as a surprise.
Regards, Henrik