It's a requirement as far as my setup is concerned.
We have the regular first-line pagers that get the alerts and can acknowledge as they start working on the issue. Some hosts and services are of higher priority within the company, and if they're down for an extended period of time, upper management wants to get a notification regardless of the ACK status. Its usually expected that the person with the pager has already done this before BB sends an alert to management, but when troubleshooting a difficult problem time can fly quite quickly.
The management can then go to the webpage, view the acked alert and see who acked it (based on the ID code) and contact them for a status report. This saves management the hassle of contacting all the pagers to find out who is working on a downed service, because they've been identified by the ACK code pager ID. With several different pagers in the group, management either needs to know which pager a particular host and service alert is sent to, or otherwise be able to easily identify who got paged and is working on the issue.
I also use an extension to the bb-ack page which lists all current alerts with their codes and which pager address or phone number the alert was sent to. If an alert hasn't been acked, management can go there and see who should have been responding to the issue. This has also been a handy place to send ACKs to multiple alerts at once.
The addition of the INFO column in BBGEN nicely displays what pagers a host and its services will send to, and this has also been a help to management.
Brent B McCrackin UNIX Systems Specialist - Bell Sympatico Brent.McCrackin at Bell.ca PH: 416-353-0692 "Serenity through viciousness."
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: February 14, 2005 11:37 AM To: hobbit at hswn.dk Subject: [hobbit] Escalated alerts - necessary ?
I changed the subject, because this is a somewhat different issue that the rest of the mail Brent wrote:
On Mon, Feb 14, 2005 at 09:39:52AM -0500, brent.mccrackin at bell.ca wrote:
A feature I'd like to see is the ability to allow an identified acknowledge of an alert based on the two-digit code, that stops alerts for all recipients except escalation recipients (those being the people that need to be alerted if a downed service is not fixed after a specific time period regardless of someone working on it). This would do away with the need for a '99' acknowledge to stop alerts for everyone, and let the person responding to the alert work on fixing it faster (at least until the escalation person starts asking for status reports).
Hobbit does not have the concept of "escalating" an alert that BB has.
I didn't fully understand what the BB's idea of "escalating" an alert meant, until I read Brent's message. I see that it could be useful, but also that it will be somewhat tricky to implement with the current design of Hobbit's alert-module.
So - how much do you use it ? Do you need to have alerts going out for problems that have been acknowledged ?
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk