RECOVERED flag in hobbit-alerts.cfg question
Hi all!
As I expand my alerting ruleset to do cool things like only page me about "printers" during the day and so on, I had a weird occurrence today.
Basically, I have rules that list less-critical things first, and what I'm trying to do is something like this:
Rules for printers: Send emails, pages when printers go offline, and recover STOP HERE, because I do not need other rules to apply (like "connectivity" below)
Rules for "connectivity" for anything: Send emails, pages.
What is happening is this:
When the device (printer) went offline, I got alerted. Yay! I then got alerted exactly two hours later. Exactly what I want. Yay again!
Then, the printer recovered. I got *two* emails and *two* pages, because presumably the printers rule *AND* the "connectivity" rule applied, even though I only want the one rule to apply.
Am I missing some intended behavior, or is the "recovered" flag ignoring how I want my rules to "stop" when certain conditions are met?
My notifications log:
Printer goes offline:
Tue Jul 25 16:50:29 2006 lp4.phys.mcw.edu.conn (141.106.188.244) sysadmins[158] 1153864229 500 Tue Jul 25 16:50:29 2006 lp4.phys.mcw.edu.conn (141.106.188.244) pagers[159] 1153864229 500
Same message sent two hours later(yay!)
Tue Jul 25 18:50:32 2006 lp4.phys.mcw.edu.conn (141.106.188.244) sysadmins[158] 1153871432 500 Tue Jul 25 18:50:32 2006 lp4.phys.mcw.edu.conn (141.106.188.244) pagers[159] 1153871432 500
*DOUBLE* messages sent when recovered state occurs...???
Tue Jul 25 19:49:21 2006 lp4.phys.mcw.edu.conn (141.106.188.244) sysadmins[158] 1153874961 500 10792 Tue Jul 25 19:49:22 2006 lp4.phys.mcw.edu.conn (141.106.188.244) pagers[159] 1153874961 500 10792 Tue Jul 25 19:49:22 2006 lp4.phys.mcw.edu.conn (141.106.188.244) sysadmins[175] 1153874961 500 10792 Tue Jul 25 19:49:22 2006 lp4.phys.mcw.edu.conn (141.106.188.244) pagers[176] 1153874961 500 10792
My ruleset for alerts:
These rules change defaults for printers warnings/alerts (only email
or
page every 2 hours)
use of IGNORE rule means NO OTHER RULE matches printers after this
rule....
HOST=%^pr.*mcw\.edu MAIL sysadmins REPEAT=120 RECOVERED FORMAT=TEXT MAIL pagers REPEAT=120 RECOVERED FORMAT=SMS IGNORE
HOST=%^hp.*mcw\.edu
MAIL sysadmins REPEAT=120 RECOVERED FORMAT=TEXT
MAIL pagers REPEAT=120 RECOVERED FORMAT=SMS
IGNORE
HOST=%^lp.*mcw\.edu
MAIL sysadmins REPEAT=120 RECOVERED FORMAT=TEXT
MAIL pagers REPEAT=120 RECOVERED FORMAT=SMS
IGNORE
Anything that loses connectivity, email/page every 30 minutes
SERVICE=conn MAIL sysadmins REPEAT=30 RECOVERED FORMAT=TEXT MAIL pagers REPEAT=30 RECOVERED FORMAT=SMS
participants (1)
-
brodie@mcw.edu