GROUPs and recovery alerts
On 05-07-2011 11:58, Heather Keen wrote:
Anyway, I think this is a BUG.
Xymon Version 4.3.3. Configuration as follows:
analysis.cfg: HOST=myhost.mydomain.com GROUP=heather PROC TESTtestTEST 1
alerts.cfg: HOST=* MAIL heather1 at mydomain.com RECOVERED
GROUP=heather MAIL heather2 at mydomain.com RECOVERED
When the alert is generated, both e-mail addresses get the notification. But when the alert is cleared, only heather1 at mydomain.com <mailto:heather1 at mydomain.com> gets the recovery message.
I've tried lots of different configuration options, and the only conclusion I can come to is that recovery messages to GROUPs do not work. :(
It's certainly not what you would expect - must agree with that. But solving it is not quite as easy as one would expect.
The problem is that when the PROC triggers a red status, Xymon knows that the rule was one that included a "GROUP=heather" setting. But when the recovery happens, it is because none of the rules in analysis.cfg triggered. So Xymon does not know that the green status is a recovery from a rule that contained the GROUP setting.
There is some state lost here.
To solve this, the xymond_alert module will have to keep track of the active alerts, and which GROUP settings triggered them. When the recovery happens, it will then use that list of groups that received the alert as the basis for sending out the recovered-notices.
It can be solved, of course. Just don't be disappointed when you see 4.3.4 being released later today without a fix for this problem.
Regards, Henrik
On 01-08-2011 17:14, Henrik Størner wrote:
On 05-07-2011 11:58, Heather Keen wrote:
I've tried lots of different configuration options, and the only conclusion I can come to is that recovery messages to GROUPs do not work. :(
It's certainly not what you would expect - must agree with that. But solving it is not quite as easy as one would expect.
After looking at this once again, I actually think there is a very simple solution to this after all. If we don't check the GROUP rules at all for recovery-messages (i.e. any group setting will match), then xymond_alert will consider all the possible recipients. However, there is another check so it only sends recovery-messages to those recipients that actually did receive the alert. So I think the attached patch should solve this.
Regards, Henrik
On 1 August 2011 16:37, Henrik Størner <henrik at hswn.dk> wrote:
On 01-08-2011 17:14, Henrik Størner wrote:
On 05-07-2011 11:58, Heather Keen wrote:
I've tried lots of different configuration options, and the only conclusion I can come to is that recovery messages to GROUPs do not work. :(
It's certainly not what you would expect - must agree with that. But solving it is not quite as easy as one would expect.
After looking at this once again, I actually think there is a very simple solution to this after all. If we don't check the GROUP rules at all for recovery-messages (i.e. any group setting will match), then xymond_alert will consider all the possible recipients. However, there is another check so it only sends recovery-messages to those recipients that actually did receive the alert. So I think the attached patch should solve this.
Regards, Henrik
Henrik,
I've been doing a bit more testing with alerts using GROUPS, and I've discovered a slight flaw with this solution, when you are using SCRIPT as the recipient rather than MAIL. Because it doesn't check the GROUP when it sends a RECOVERED message, you can end up getting multiple RECOVERED messages sent to the same person. (tested with v4.3.7)
For example:
GROUP=A SERVICE=procs RECOVERED COLOR=red SCRIPT /home/xymon/server/ext/sms_notification 447777123456 FORMAT=SMS DURATION>5 GROUP=B SERVICE=procs RECOVERED COLOR=red SCRIPT /home/xymon/server/ext/sms_notification 447777123456 FORMAT=SMS DURATION>10
So you've got two groups of machines, each having the same recipient, but needing a different alert delay. Now, if procs goes red on a machine in group A, the red alert is handled fine, but when it recovers, 447777123456 actually gets two recovery messages.
Note this only happens if the recipient is a SCRIPT command, it works fine if you use MAIL recipients.
Help!
Cheers, Heather
participants (2)
-
henrik@hswn.dk
-
keenha@googlemail.com