Hi,
I've been asked to try to make alerts only send 2 emals, at most, and still send a RECOVERED message when things recover. As a quick test, I set DURATION<30, and REPEAT=15, and after forcing a service down, for over an hour, achieved the two email alerts - but after bringing the service back up, did not get sent a recovered message.
Is it possible to do this?
Alan
On Wed, Jun 29, 2005 at 01:27:45PM -0400, Killenbeck, Alan wrote:
I've been asked to try to make alerts only send 2 emals, at most, and still send a RECOVERED message when things recover. As a quick test, I set DURATION<30, and REPEAT=15, and after forcing a service down, for over an hour, achieved the two email alerts - but after bringing the service back up, did not get sent a recovered message.
Hmm - hadn't thought about that. I'd say it ought to work, but looking at the way recovery messages are handled it seems you're right - if the max. duration has been reached, the recovery message is never sent.
It's a bug. Will fix.
Regards, Henrik
On Wed, Jun 29, 2005 at 08:55:31PM +0200, Henrik Stoerner wrote:
On Wed, Jun 29, 2005 at 01:27:45PM -0400, Killenbeck, Alan wrote:
I've been asked to try to make alerts only send 2 emals, at most, and still send a RECOVERED message when things recover.
It's a bug. Will fix.
I think this patch should do it. --- hobbitd/do_alert.c 2005/06/06 09:27:07 1.69 +++ hobbitd/do_alert.c 2005/06/29 18:58:54 @@ -960,20 +960,26 @@ /* At this point, we know the configuration may result in an alert. */ if (anymatch) (*anymatch)++; - duration = (time(NULL) - alert->eventstart); - if (crit && crit->minduration && (duration < crit->minduration)) { - traceprintf("Failed '%s' (min. duration %d<%d)\n", cfline, duration, crit->minduration); - if (!printmode) return 0; - } + /* + * Time checks should be done on real paging messages only. + * Not on recovery- or notify-messages. + */ + if (alert->state == A_PAGING) { + duration = (time(NULL) - alert->eventstart); + if (crit && crit->minduration && (duration < crit->minduration)) { + traceprintf("Failed '%s' (min. duration %d<%d)\n", cfline, duration, crit->minduration); + if (!printmode) return 0; + } - if (crit && crit->maxduration && (duration > crit->maxduration)) { - traceprintf("Failed '%s' (max. duration %d>%d)\n", cfline, duration, crit->maxduration); - if (!printmode) return 0; - } + if (crit && crit->maxduration && (duration > crit->maxduration)) { + traceprintf("Failed '%s' (max. duration %d>%d)\n", cfline, duration, crit->maxduration); + if (!printmode) return 0; + } - if (crit && crit->timespec && !timematch(crit->timespec)) { - traceprintf("Failed '%s' (time criteria)\n", cfline); - if (!printmode) return 0; + if (crit && crit->timespec && !timematch(crit->timespec)) { + traceprintf("Failed '%s' (time criteria)\n", cfline); + if (!printmode) return 0; + } } /* Check color. For RECOVERED messages, this holds the color of the alert, not the recovery state */
Killenbeck, Alan wrote:
I've been asked to try to make alerts only send 2 emals, at most, and still send a RECOVERED message when things recover.
As a quick test, I set DURATION<30, and REPEAT=15, and after forcing a service down, for over an hour, achieved the two email alerts - but after bringing the service back up, did not get sent a recovered message.
Is it possible to do this?
From what you wrote, you're going to get more than 2 emails if the system remains offline longer than an hour. The duration means the service can be out 30 mins before you get your first email, but after that threshold is reached, you'll get another every 15 mins *until* it is fixed. It won't stop after the second one. So, if you're down for 2 hours, you should see ~6 messages: 1 30 mins after the crash, then the rest every 15 mins thereafter. I'm not aware of any way to restrict the emails to exactly 2.
As for a recovered message, did you add the RECOVERED option to your alert rules? Hobbit doesn't send recovery messages unless you explicitly ask it to.
Tom
Tom Georgoulias wrote:
Killenbeck, Alan wrote:
I've been asked to try to make alerts only send 2 emals, at most, and still send a RECOVERED message when things recover.
As a quick test, I set DURATION<30, and REPEAT=15, and after forcing a service down, for over an hour, achieved the two email alerts - but after bringing the service back up, did not get sent a recovered message.
Is it possible to do this?
Oops, scratch my last message. I thought the < in DURATION was pointing
way. ;)
Tom
participants (3)
-
Alan.Killenbeck@xerox.com
-
henrik@hswn.dk
-
tgeorgoulias@mcclatchy.com