alert storm / intelligent extra mailscript
Hi !
I've got an problem with my colleagues and the alert-storm if a hole batchfarm will be rebooted for kernel-upgrade etc. .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to make only one command on the console ...
I know, i asked something similiar before http://www.hswn.dk/hobbiton/2009/01/msg00398.html Re: [hobbit] remote/commandline Acknowledge Alerts
and Henrik answered quite right like anytime :-) but this works only, if i know the id of the event, in our situation i needed it before the event(s) started .. :-(
they don't want to got 5 or more mails for only one machine ( by ca. 50 or more machines) ...
So, we've played somthing around with Duration,Recovered ..
Now i've got two mails for Conn ( RED & Recovered) and one for cpu ( Yellow for reboot) ... we can reduce them to only two mails of course ( deactivate the Recovered for Conn or make an higher Duration for the cpu-reboot-mail) ...
My Question is, if there still exist an intelligent extra mailscript or something else which look at the conn-condition and if it's bad, it doesn't send any alarm for all other services only for conn ....
Thanks & cheers
Martin
On Friday 17 April 2009 11:16:17 Martin Flemming wrote:
Hi !
I've got an problem with my colleagues and the alert-storm if a hole batchfarm will be rebooted for kernel-upgrade etc. .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to make only one command on the console ...
I know, i asked something similiar before http://www.hswn.dk/hobbiton/2009/01/msg00398.html Re: [hobbit] remote/commandline Acknowledge Alerts
IMHO, planned changes should be preceded by disabling the tests that would be affected, which can easily be done with a command-line ...
On Fri, 17 Apr 2009, Buchan Milne wrote:
On Friday 17 April 2009 11:16:17 Martin Flemming wrote:
Hi !
I've got an problem with my colleagues and the alert-storm if a hole batchfarm will be rebooted for kernel-upgrade etc. .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to make only one command on the console ...
I know, i asked something similiar before http://www.hswn.dk/hobbiton/2009/01/msg00398.html Re: [hobbit] remote/commandline Acknowledge Alerts
IMHO, planned changes should be preceded by disabling the tests that would be affected,
Yep, you're right of course ..
which can easily be done with a command-line ...
But how i know early the alert-id for the host/service e.g. cpu & conn for host1,host2 ?
NAME bb-ack.cgi - Hobbit CGI script to acknowledge alerts
bb-ack.cgi is passed a QUERY_STRING environment variable with the ACTION, NUMBER, DELAY and MESSAGE parameters.
NUMBER is the number identifying the host/service to be acknowledged. It is included in all alert-messages sent out by Hobbit.
Or did i something missing ?
cheers,
Martin
Ok, i've to read again the manual first .. :-(
http://www.hswn.dk/hobbit/help/manpages/man1/bb.1.html
XYMON MESSAGE SYNTAX
disable HOSTNAME.TESTNAME DURATION <additional text> Disables a specific test for DURATION minutes. This will cause the status of this test to be listed as "blue" on the BBDISPLAY server, and no alerts for this host/test will be generated. If DURATION is given as a number followed by s/m/h/d, it is interpreted as being in seconds/minutes/hours/days respectively. Todisablealltestsforahost,useanasterisk*forTESTNAME.
Right ?
I will try it .. sorry
cheers, martin
On Fri, 17 Apr 2009, Martin Flemming wrote:
On Fri, 17 Apr 2009, Buchan Milne wrote:
On Friday 17 April 2009 11:16:17 Martin Flemming wrote:
Hi !
I've got an problem with my colleagues and the alert-storm if a hole batchfarm will be rebooted for kernel-upgrade etc. .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to make only one command on the console ...
I know, i asked something similiar before http://www.hswn.dk/hobbiton/2009/01/msg00398.html Re: [hobbit] remote/commandline Acknowledge Alerts
IMHO, planned changes should be preceded by disabling the tests that would be affected,
Yep, you're right of course ..
which can easily be done with a command-line ...
But how i know early the alert-id for the host/service e.g. cpu & conn for host1,host2 ?
NAME bb-ack.cgi - Hobbit CGI script to acknowledge alerts
bb-ack.cgi is passed a QUERY_STRING environment variable with theACTION, NUMBER, DELAY and MESSAGE parameters.
NUMBER is the number identifying the host/service to be acknowledged.It is included in all alert-messages sent out by Hobbit.
Or did i something missing ?
cheers,
MartinTo unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Heureka !
Works like a charme, of course :-)
.. but if someone got such "intelligent mailscript" i'm interesting anyway ... :-)
martin
On Fri, 17 Apr 2009, Martin Flemming wrote:
Ok, i've to read again the manual first .. :-(
http://www.hswn.dk/hobbit/help/manpages/man1/bb.1.html
XYMON MESSAGE SYNTAX
disable HOSTNAME.TESTNAME DURATION <additional text> Disables a specific test for DURATION minutes. This will cause the status of this test to be listed as "blue" on the BBDISPLAY server, and no alerts for this host/test will be generated. If DURATION is given as a number followed by s/m/h/d, it is interpreted as being in seconds/minutes/hours/days respectively. Todisablealltestsforahost,useanasterisk*forTESTNAME.
Right ?
I will try it .. sorry
cheers, martin
On Fri, 17 Apr 2009, Martin Flemming wrote:
On Fri, 17 Apr 2009, Buchan Milne wrote:
On Friday 17 April 2009 11:16:17 Martin Flemming wrote:
Hi !
I've got an problem with my colleagues and the alert-storm if a hole batchfarm will be rebooted for kernel-upgrade etc. .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to make only one command on the console ...
I know, i asked something similiar before http://www.hswn.dk/hobbiton/2009/01/msg00398.html Re: [hobbit] remote/commandline Acknowledge Alerts
IMHO, planned changes should be preceded by disabling the tests that would be affected,
Yep, you're right of course ..
which can easily be done with a command-line ...
But how i know early the alert-id for the host/service e.g. cpu & conn for host1,host2 ?
NAME bb-ack.cgi - Hobbit CGI script to acknowledge alerts
bb-ack.cgi is passed a QUERY_STRING environment variable with theACTION, NUMBER, DELAY and MESSAGE parameters.
NUMBER is the number identifying the host/service to beacknowledged. It is included in all alert-messages sent out by Hobbit.
Or did i something missing ?
cheers,
MartinTo unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Gruss
Martin Flemming
Martin Flemming DESY / IT office : Building 2b / 008a Notkestr. 85 phone : 040 - 8998 - 4667 22603 Hamburg mail : martin.flemming at desy.de
participants (2)
-
bgmilne@staff.telkomsa.net
-
martin.flemming@desy.de