On Wed, Feb 02, 2005 at 08:56:22AM -0500, Tom Georgoulias wrote:
HOST=$FOUND_SYS MAIL broken at nandomedia.com SERVICE=procs COLOR=red DURATION>5 REPEAT=5
After I add this rule, I restart hobbit. I read on the list that restarting isn't necessary, but it has been my experience that changes made to hobbit-alerts.cfg do not always get put into effect unless hobbit is restarted.
It shouldn't be needed, but it doesn't harm.
2005-02-02 08:11:12 criteriamatch foundry01.nandomedia.com:procs (NULL):(NULL):procs 2005-02-02 08:11:12 failed minduration 0<300
OK
2005-02-02 08:16:12 Got page message from foundry01.nandomedia.com:procs 2005-02-02 08:16:12 0 alerts to go
And this looks suspicious.
What's supposed to happen is that after the alert is first reported to the hobbitd_alert module, this module is supposed to keep track of when the next alert is due (the REPEAT interval comes into play here), and if no alerts are due then you get the "0 alerts to go" message.
So something messes up the timekeeping, and we never get around to testing if the DURATION triggers after the first attempt.
[after looking over the code for 10 minutes]
I think I've got it, but there's been quite a few changes to various bits so I dont want to send one-line fixes now. I'll come up with a proper full package, which will also include fixes for many of the other bugs that have been reported for beta6.
Henrik