hobbit-alerts.cfg question
Hi all
The alerting is starting to take shape but I've a question regarding how the alerting works. If I have a stanza similar to the following, how is it evaluated? Once for all hosts, or for one host at a time?
HOST=%.* # Proliant tests MAIL sms at somecompany.com SERVICE=proliant FORMAT=SMS REPEAT=1440m MAIL sms at somecompany.com SERVICE=proliant FORMAT=SMS RECOVERED
# conn where status is RED
MAIL sms at somecompany.com COLOR=red SERVICE=conn EXPAGE=dev REPEAT=1440m
MAIL sms at somecompany.com COLOR=red SERVICE=conn EXPAGE=dev RECOVERED
# conn where status is RED (dev/test)
MAIL email at somecompany.com COLOR=red SERVICE=conn PAGE=dev REPEAT=1440m
MAIL email at somecompany.com COLOR=red SERVICE=conn PAGE=dev RECOVERED
# cpu,disk,memory where status is RED
MAIL sms at somecompany.com COLOR=red SERVICE=cpu,disk,memory
EXPAGE=dev REPEAT=1440m MAIL sms at somecompany.com COLOR=red SERVICE=cpu,disk,memory EXPAGE=dev RECOVERED
# Dev servers
MAIL email at somecompany.com COLOR=red SERVICE=cpu,disk,memory
PAGE=dev REPEAT=1440m MAIL email at somecompany.com COLOR=red SERVICE=cpu,disk,memory PAGE=dev RECOVERED
# Non-dev status YELLOW
MAIL email at somecompany.com COLOR=yellow
SERVICE=cpu,disk,memory REPEAT=1440m DURATION>30m MAIL email at somecompany.com COLOR=yellow SERVICE=cpu,disk,memory RECOVERED
Also, I've noticed that when a fault occurs I get two emails (or sms') and another when the fault is rectified. I'm thinking this is because of the 'RECOVERED' line but i thought this would only trigger when the fault goes. Have I misunderstood?
Thanks
CC
-- RHCE#805007969328369
Hi Colin
One line per alert, with RECOVERED on the end. Change it to something like this. MAIL sms at somecompany.com SERVICE=proliant FORMAT=SMS REPEAT=1440m RECOVERED
Cheers Vernon
On Fri, Oct 8, 2010 at 10:40 AM, Colin Coe <colin.coe at gmail.com> wrote:
Hi all
The alerting is starting to take shape but I've a question regarding how the alerting works. If I have a stanza similar to the following, how is it evaluated? Once for all hosts, or for one host at a time?
HOST=%.* # Proliant tests MAIL sms at somecompany.com SERVICE=proliant FORMAT=SMS REPEAT=1440m MAIL sms at somecompany.com SERVICE=proliant FORMAT=SMS RECOVERED
# conn where status is RED MAIL sms at somecompany.com COLOR=red SERVICE=conn EXPAGE=devREPEAT=1440m MAIL sms at somecompany.com COLOR=red SERVICE=conn EXPAGE=dev RECOVERED
# conn where status is RED (dev/test) MAIL email at somecompany.com COLOR=red SERVICE=conn PAGE=devREPEAT=1440m MAIL email at somecompany.com COLOR=red SERVICE=conn PAGE=dev RECOVERED
# cpu,disk,memory where status is RED MAIL sms at somecompany.com COLOR=red SERVICE=cpu,disk,memoryEXPAGE=dev REPEAT=1440m MAIL sms at somecompany.com COLOR=red SERVICE=cpu,disk,memory EXPAGE=dev RECOVERED
# Dev servers MAIL email at somecompany.com COLOR=red SERVICE=cpu,disk,memoryPAGE=dev REPEAT=1440m MAIL email at somecompany.com COLOR=red SERVICE=cpu,disk,memory PAGE=dev RECOVERED
# Non-dev status YELLOW MAIL email at somecompany.com COLOR=yellowSERVICE=cpu,disk,memory REPEAT=1440m DURATION>30m MAIL email at somecompany.com COLOR=yellow SERVICE=cpu,disk,memory RECOVERED
Also, I've noticed that when a fault occurs I get two emails (or sms') and another when the fault is rectified. I'm thinking this is because of the 'RECOVERED' line but i thought this would only trigger when the fault goes. Have I misunderstood?
Thanks
CC
-- RHCE#805007969328369
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com
Cool. Thanks Vernon.
On Fri, Oct 8, 2010 at 11:12 AM, Vernon Everett <everett.vernon at gmail.com> wrote:
Hi Colin
One line per alert, with RECOVERED on the end. Change it to something like this. MAIL sms at somecompany.com SERVICE=proliant FORMAT=SMS REPEAT=1440m RECOVERED
Cheers Vernon
On Fri, Oct 8, 2010 at 10:40 AM, Colin Coe <colin.coe at gmail.com> wrote:
Hi all
The alerting is starting to take shape but I've a question regarding how the alerting works. If I have a stanza similar to the following, how is it evaluated? Once for all hosts, or for one host at a time?
HOST=%.* # Proliant tests MAIL sms at somecompany.com SERVICE=proliant FORMAT=SMS REPEAT=1440m MAIL sms at somecompany.com SERVICE=proliant FORMAT=SMS RECOVERED
# conn where status is RED MAIL sms at somecompany.com COLOR=red SERVICE=conn EXPAGE=dev REPEAT=1440m MAIL sms at somecompany.com COLOR=red SERVICE=conn EXPAGE=dev RECOVERED
# conn where status is RED (dev/test) MAIL email at somecompany.com COLOR=red SERVICE=conn PAGE=dev REPEAT=1440m MAIL email at somecompany.com COLOR=red SERVICE=conn PAGE=dev RECOVERED
# cpu,disk,memory where status is RED MAIL sms at somecompany.com COLOR=red SERVICE=cpu,disk,memory EXPAGE=dev REPEAT=1440m MAIL sms at somecompany.com COLOR=red SERVICE=cpu,disk,memory EXPAGE=dev RECOVERED
# Dev servers MAIL email at somecompany.com COLOR=red SERVICE=cpu,disk,memory PAGE=dev REPEAT=1440m MAIL email at somecompany.com COLOR=red SERVICE=cpu,disk,memory PAGE=dev RECOVERED
# Non-dev status YELLOW MAIL email at somecompany.com COLOR=yellow SERVICE=cpu,disk,memory REPEAT=1440m DURATION>30m MAIL email at somecompany.com COLOR=yellow SERVICE=cpu,disk,memory RECOVERED
Also, I've noticed that when a fault occurs I get two emails (or sms') and another when the fault is rectified. I'm thinking this is because of the 'RECOVERED' line but i thought this would only trigger when the fault goes. Have I misunderstood?
Thanks
CC
-- RHCE#805007969328369
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com
-- RHCE#805007969328369
In <AANLkTi=OUGSQU_YvUn3temW2Pii09zPXyvUkmfjNTFbx at mail.gmail.com> Colin Coe <colin.coe at gmail.com> writes:
The alerting is starting to take shape but I've a question regarding how the alerting works. If I have a stanza similar to the following, how is it evaluated? Once for all hosts, or for one host at a time?
I understand your curiosity, but does it really matter how ? But it is evaluated whenever a potential alert may be generated, based on the host/service combination, time-of-day and all the other criteria. Think of it as a set of rules, and each time there something red or yellow, hobbitd_alert looks at this set of rules and finds those actions that match (if any).
HOST=%.* # Proliant tests MAIL sms at somecompany.com SERVICE=proliant FORMAT=SMS REPEAT=1440m MAIL sms at somecompany.com SERVICE=proliant FORMAT=SMS RECOVERED
Also, I've noticed that when a fault occurs I get two emails (or sms') and another when the fault is rectified. I'm thinking this is because of the 'RECOVERED' line but i thought this would only trigger when the fault goes. Have I misunderstood?
I think you have. Your configuration sets up two alerting actions, but both of them send mail to the same recipient. That's why you get two messages. What you want to do is simpler:
HOST=%.* # Proliant tests MAIL sms at somecompany.com SERVICE=proliant FORMAT=SMS REPEAT=1440m RECOVERED
This will give you one message when the service goes red or yellow, and one when it recovers. "RECOVERED" is an "add-on" to the normal alert, since you probably would like to know not only when something is fixed, but also when it broke.
Regards, Henrik
participants (3)
-
colin.coe@gmail.com
-
everett.vernon@gmail.com
-
henrik@hswn.dk