Hi !
I've got an problem with my colleagues and the alert-storm if a hole batchfarm will be rebooted for kernel-upgrade etc. .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to make only one command on the console ...
I know, i asked something similiar before http://www.hswn.dk/hobbiton/2009/01/msg00398.html Re: [hobbit] remote/commandline Acknowledge Alerts
and Henrik answered quite right like anytime :-) but this works only, if i know the id of the event, in our situation i needed it before the event(s) started .. :-(
they don't want to got 5 or more mails for only one machine ( by ca. 50 or more machines) ...
So, we've played somthing around with Duration,Recovered ..
Now i've got two mails for Conn ( RED & Recovered) and one for cpu ( Yellow for reboot) ... we can reduce them to only two mails of course ( deactivate the Recovered for Conn or make an higher Duration for the cpu-reboot-mail) ...
My Question is, if there still exist an intelligent extra mailscript or something else which look at the conn-condition and if it's bad, it doesn't send any alarm for all other services only for conn ....
Thanks & cheers
Martin