On Wed, Oct 26, 2005 at 04:12:00AM -0700, Charles Jones wrote:
Perhaps it's because I'm working on this at 4am, but I'm having a problem with the EXHOST option, that according to hobbitd_alert --test isn't working, I also am not sure how to do a particular host/service exclusion.
Heres basically what my below alert config is meant to accomplish.
- For any alerts on any servers, send alerts to an alert email address.
- For 2 particular web servers (web5.mydomain.com and web6.mydomain.com), send an alert to one person, but *not *the alert alias.
- For a set of oracle servers, send an extra alert message to an alternate email address/cellphone.
One way of doing these would be:
2 special webservers, that ONLY get this alert (2)
HOST=$WEB_SERVERS SERVICE=msgs COLOR=red MAIL webdev at mydomain.com STOP
Oracle alerts (3)
HOST=$ORACLE_SERVERS SERVICE=msgs,oradb,orasys COLOR=red FORMAT=sms MAIL dbacell at cellphone.com
Default rule (1)
HOST=$ALL_HOSTS SERVICE=* COLOR=red MAIL alert at mydomain.com
- After hours (from 5pm until 8am), only send alerts to an alternate email address (but still need the seperate alert for the web5 and web6 hosts described in #2).
- After hours (from 5pm until 8am), send an alert to my cellphone for any hosts and services being red for more than 30 mins.
For these, modify the default rule marked (1) to use different alerts based on time. E.g.
Default rule (1)
HOST=$ALL_HOSTS SERVICE=* COLOR=red MAIL alert at mydomain.com TIME=*:0800:1700 # Outside office hours, mail alerts to a different address (4) MAIL alternate at mydomain.com TIME=*:1700:0800 # Outside office hours, send to my cell phone (5) MAIL mycell at cellphone.com FORMAT=sms DURATION>30 TIME=*:1700:0800
- Do not alert for high load average on a particular server from 6-10am.
There's no really elegant way of doing that ... it makes me think that perhaps there should be some way of defining a "no-action" rule: "For these conditions, do NOT send any alerts, and stop looking for more alert recipients". But for now, you'll have to modify the default rule to exclude that host, then setup specific rules for that host. So your default rule becomes
Default rule (1)
HOST=$ALL_HOSTS SERVICE=* COLOR=red EXHOST=dataproc1.mydomain.com MAIL alert at mydomain.com TIME=*:0800:1700 # Outside office hours, mail alerts to a different address (4) MAIL alternate at mydomain.com TIME=*:1700:0800 # Outside office hours, send to my cell phone (5) MAIL mycell at cellphone.com FORMAT=sms DURATION>30 TIME=*:1700:0800
and the specific rules for that host:
Load avg alerts only from 10am -> 6am
HOST=dataproc1.mydomain.com SERVICE=la TIME=*:1000:0600 MAIL alert at mydomain.com TIME=*:0800:1700 MAIL alternate at mydomain.com TIME=*:1700:0800 MAIL mycell at cellphone.com FORMAT=sms DURATION>30 TIME=*:1700:0800
All other services alert like the normal default rule.
HOST=dataproc1.mydomain.com EXSERVICE=la MAIL alert at mydomain.com TIME=*:0800:1700 MAIL alternate at mydomain.com TIME=*:1700:0800 MAIL mycell at cellphone.com FORMAT=sms DURATION>30 TIME=*:1700:0800
Regards, Henrik