That is one thing I have thought about bringing up a few times - a summary alert.
When the power goes out or the WAN has issues, I get text messages of very important servers. The problem behind this is when they go up and down it is very irritating to battle through even several messages on my phone. I have a BB8800 which allows me to go through them pretty quick, but for an admin with a RAZR a dozen text messages would take several minutes to go through.
Maybe we could get some sort of toggle-able proxy for all alerts and the proxy sends out a summary every 60s? Just tossing ideas out here at this point.
Josh
On Mon, Jun 16, 2008 at 2:07 PM, Linder, Doug (SABIC Innovative Plastics, consultant) <Doug.Linder at sabic-ip.com> wrote:
Sloan [mailto:joe at tmsusa.com] wrote:
We've not had a bb server go down in all the years we've been using it, but sometimes wan connectivity goes away due to circumstances beyond our control
This is by far the biggest annoyance we have with all system monitoring
- when networks go down. It's a problem with every monitoring tool there is and I can't think of any way to solve it: the monitoring system has no way of knowing whether a system is down because it crashed or if it's down because the network went down. All it knows is that it can't talk to the system anymore and something is wrong, so it generates an alert. When a whole network goes down, it can become hundreds of simultaneous alerts. And that's annoying enough when it's just email alerts. When you use Hobbit to generate cases in your trouble ticket system, that can be hundreds of new, useless cases to manually close.
We don't want to raise the amount of time a system has to be down before Hobbit generates an alert, because we want to know as soon as possible. But if we keep that number too low, then when the network has a brief hiccup, we get hundreds of redundant cases. This is especially a problem with overseas networks on the WAN.
I think the only possible solution would be for Hobbit to have some kind of flood-detection routine built in, where it could tell how rapidly it was sending alerts about connection problems for machines all on the same network, and was smart enough to think "Whoa, I'm about to send 100 connection alarms about systems on the same network.... Instead of sending 100 of them, maybe I'll just send ONE alert saying "You got a big problem here."
Doug Linder
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer