Excluding Hosts from Paging at certain times
Hi there,
Not sure if this is the right place to be sending this as I've only just subscribed and the emails dont quite say exactly where to email but here goes.
We have just upgraded from bb after 5 years or so to hobbit and we're not looking back. We have some servers that have to be rebooted on a daily basis due to some of the poorly written software (not by us) that runs on them. I have tred using the DOWNTIME= option on the BB hosts but it only seems to ignore the net tests like http, conn etc. The server still has the bb client installed on it so when the server comes back up the bb client starts before some of the monitored services start and therefore marks them red and we get paged on them.
Given our worldwide customer base we still want to know about the issues 24x7 except for the 10 mins or so when it reboots. BB used to have the option of putting in something like !host:time:email and it wouldnt page at that particular time for that hosts. Just wondering if someone can suggest the best way to work around this.
We have one alert that goes out via email and then to sortof a mailing list, and then another line that does email-to-pager. I was thinking something along the lines of what I have below although I am not sure about the TIME statement if you can have comma separated values, ,if you can pass midnight such as TIME=*:0910:0900
There is only 4 hosts currently that reboot, one at midnight, two at 9am, one at 8:30am
If someone could possible give me a hand it and point me in the right direction it would be greatly appreciated
Cheers
Allan
#extract of hobbit-alerts.cfg
$NPSERV=cpu,disk,msgs
HOST=host1.domain.com MAIL emailalret at domain.com COLOR=red,purple REPEAT=15 RECOVERED TIME=*:0010:0000 MAIL email at pageraddress.com COLOR=red,purple REPEAT=15 FORMAT=SMS EXSERVICE=$NPSERV TIME=*:0010:0000 STOP
HOST=host2.domain.com MAIL emailalret at domain.com COLOR=red,purple REPEAT=15 RECOVERED TIME=*:0000:0900,*:0910:0000 MAIL email at pageraddress.com COLOR=red,purple REPEAT=15 FORMAT=SMS EXSERVICE=$NPSERV TIME=*:0000:0900,*:0910:0000 STOP
HOST=* MAIL emailalret at domain.com REPEAT=15 COLOR=red,purple RECOVERED MAIL email at pageraddress.com FORMAT=SMS REPEAT=15 COLOR=red,purple EXSERVICE=$NPSERV
Just something else I forgot to mention, we want to send everything to email such as msgs disk http conn etc, but not send disk, cpu, msgs to the pager email address
----- Original Message ----- From: "ZanDAhaR" <allan at zandahar.net> To: <hobbit at hswn.dk> Sent: Friday, March 11, 2005 11:35 AM Subject: [hobbit] Excluding Hosts from Paging at certain times
Hi there,
Not sure if this is the right place to be sending this as I've only just subscribed and the emails dont quite say exactly where to email but here goes.
We have just upgraded from bb after 5 years or so to hobbit and we're not looking back. We have some servers that have to be rebooted on a daily basis due to some of the poorly written software (not by us) that runs on them. I have tred using the DOWNTIME= option on the BB hosts but it only seems to ignore the net tests like http, conn etc. The server still has the bb client installed on it so when the server comes back up the bb client starts before some of the monitored services start and therefore marks them red and we get paged on them.
Given our worldwide customer base we still want to know about the issues 24x7 except for the 10 mins or so when it reboots. BB used to have the option of putting in something like !host:time:email and it wouldnt page at that particular time for that hosts. Just wondering if someone can suggest the best way to work around this.
We have one alert that goes out via email and then to sortof a mailing list, and then another line that does email-to-pager. I was thinking something along the lines of what I have below although I am not sure about the TIME statement if you can have comma separated values, ,if you can pass midnight such as TIME=*:0910:0900
There is only 4 hosts currently that reboot, one at midnight, two at 9am, one at 8:30am
If someone could possible give me a hand it and point me in the right direction it would be greatly appreciated
Cheers
Allan
#extract of hobbit-alerts.cfg
$NPSERV=cpu,disk,msgs
HOST=host1.domain.com MAIL emailalret at domain.com COLOR=red,purple REPEAT=15 RECOVERED TIME=*:0010:0000 MAIL email at pageraddress.com COLOR=red,purple REPEAT=15 FORMAT=SMS EXSERVICE=$NPSERV TIME=*:0010:0000 STOP
HOST=host2.domain.com MAIL emailalret at domain.com COLOR=red,purple REPEAT=15 RECOVERED TIME=*:0000:0900,*:0910:0000 MAIL email at pageraddress.com COLOR=red,purple REPEAT=15 FORMAT=SMS EXSERVICE=$NPSERV TIME=*:0000:0900,*:0910:0000 STOP
HOST=* MAIL emailalret at domain.com REPEAT=15 COLOR=red,purple RECOVERED MAIL email at pageraddress.com FORMAT=SMS REPEAT=15 COLOR=red,purple EXSERVICE=$NPSERV
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
If you need more info let me know.
From the help file:
The first line defines a rule for alerting when something breaks on the host "www.foo.com". There are two recipients: webmaster at foo.com is notified if it is the "http" service that fails, and the notification is repeated once an hour until the problem is resolved. unixsupport at foo.com is notified if it is the "cpu", "disk" or "memory" tests that report a failure. Since there is no "REPEAT" setting for this recipient, the default is used which is to repeat the alert every 30 minutes.
OK, suppose now that the webmaster complains about getting e-mails at 4 AM in the morning. The webserver is not supposed to be running between 9 PM and 8 AM, so even though there is a problem, he doesn't want to hear about it until 7:30 - that gives him just enough time to fix the problem. So you must modify the rule so that it doesn't send out alerts until 7:30 AM:
HOST=www.foo.com
MAIL webmaster at foo.com SERVICE=http REPEAT=1h TIME=*:0730:2100
MAIL unixsupport at foo.com SERVICE=cpu,disk,memory
Adding the TIME setting on the recipient causes the alerts for this recipient to be suppressed, unless the time of day is within the interval. So with this setup, the webmaster gets his sleep.
What would have happened if you put the TIME setting on the rule instead of on the recipient ? Like this:
HOST=www.foo.com TIME=*:0730:2100
MAIL webmaster at foo.com SERVICE=http REPEAT=1h
MAIL unixsupport at foo.com SERVICE=cpu,disk,memory
Well, the webmaster would still have his nights to himself - but the TIME setting would then also apply to the alerts that go out when there is a problem with the "cpu", "disk" or "memory" services. So there would not be any mails going to unixsupport at foo.com when a disk fills up during the night.
On Fri, 11 Mar 2005 11:40:00 +1100, ZanDAhaR <allan at zandahar.net> wrote:
Just something else I forgot to mention, we want to send everything to email such as msgs disk http conn etc, but not send disk, cpu, msgs to the pager email address
----- Original Message ----- From: "ZanDAhaR" <allan at zandahar.net> To: <hobbit at hswn.dk> Sent: Friday, March 11, 2005 11:35 AM Subject: [hobbit] Excluding Hosts from Paging at certain times
Hi there,
Not sure if this is the right place to be sending this as I've only just subscribed and the emails dont quite say exactly where to email but here goes.
We have just upgraded from bb after 5 years or so to hobbit and we're not looking back. We have some servers that have to be rebooted on a daily basis due to some of the poorly written software (not by us) that runs on them. I have tred using the DOWNTIME= option on the BB hosts but it only seems to ignore the net tests like http, conn etc. The server still has the bb client installed on it so when the server comes back up the bb client starts before some of the monitored services start and therefore marks them red and we get paged on them.
Given our worldwide customer base we still want to know about the issues 24x7 except for the 10 mins or so when it reboots. BB used to have the option of putting in something like !host:time:email and it wouldnt page at that particular time for that hosts. Just wondering if someone can suggest the best way to work around this.
We have one alert that goes out via email and then to sortof a mailing list, and then another line that does email-to-pager. I was thinking something along the lines of what I have below although I am not sure about the TIME statement if you can have comma separated values, ,if you can pass midnight such as TIME=*:0910:0900
There is only 4 hosts currently that reboot, one at midnight, two at 9am, one at 8:30am
If someone could possible give me a hand it and point me in the right direction it would be greatly appreciated
Cheers
Allan
#extract of hobbit-alerts.cfg
$NPSERV=cpu,disk,msgs
HOST=host1.domain.com MAIL emailalret at domain.com COLOR=red,purple REPEAT=15 RECOVERED TIME=*:0010:0000 MAIL email at pageraddress.com COLOR=red,purple REPEAT=15 FORMAT=SMS EXSERVICE=$NPSERV TIME=*:0010:0000 STOP
HOST=host2.domain.com MAIL emailalret at domain.com COLOR=red,purple REPEAT=15 RECOVERED TIME=*:0000:0900,*:0910:0000 MAIL email at pageraddress.com COLOR=red,purple REPEAT=15 FORMAT=SMS EXSERVICE=$NPSERV TIME=*:0000:0900,*:0910:0000 STOP
HOST=* MAIL emailalret at domain.com REPEAT=15 COLOR=red,purple RECOVERED MAIL email at pageraddress.com FORMAT=SMS REPEAT=15 COLOR=red,purple EXSERVICE=$NPSERV
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
I've already read all of the man pages online but I still couldnt be 100% sure about what was can and cant be specified in terms of times.
Also I needed clarification on what exactly the downtime param in the bb-hosts related to
----- Original Message ----- From: "kevin grady" <kevin.grady at gmail.com> To: <hobbit at hswn.dk> Sent: Friday, March 11, 2005 11:57 AM Subject: Re: [hobbit] Excluding Hosts from Paging at certain times
If you need more info let me know.
From the help file:
The first line defines a rule for alerting when something breaks on the host "www.foo.com". There are two recipients: webmaster at foo.com is notified if it is the "http" service that fails, and the notification is repeated once an hour until the problem is resolved. unixsupport at foo.com is notified if it is the "cpu", "disk" or "memory" tests that report a failure. Since there is no "REPEAT" setting for this recipient, the default is used which is to repeat the alert every 30 minutes.
OK, suppose now that the webmaster complains about getting e-mails at 4 AM in the morning. The webserver is not supposed to be running between 9 PM and 8 AM, so even though there is a problem, he doesn't want to hear about it until 7:30 - that gives him just enough time to fix the problem. So you must modify the rule so that it doesn't send out alerts until 7:30 AM:
HOST=www.foo.com MAIL webmaster at foo.com SERVICE=http REPEAT=1h TIME=*:0730:2100 MAIL unixsupport at foo.com SERVICE=cpu,disk,memory
Adding the TIME setting on the recipient causes the alerts for this recipient to be suppressed, unless the time of day is within the interval. So with this setup, the webmaster gets his sleep.
What would have happened if you put the TIME setting on the rule instead of on the recipient ? Like this:
HOST=www.foo.com TIME=*:0730:2100 MAIL webmaster at foo.com SERVICE=http REPEAT=1h MAIL unixsupport at foo.com SERVICE=cpu,disk,memory
Well, the webmaster would still have his nights to himself - but the TIME setting would then also apply to the alerts that go out when there is a problem with the "cpu", "disk" or "memory" services. So there would not be any mails going to unixsupport at foo.com when a disk fills up during the night.
On Fri, 11 Mar 2005 11:40:00 +1100, ZanDAhaR <allan at zandahar.net> wrote:
Just something else I forgot to mention, we want to send everything to email such as msgs disk http conn etc, but not send disk, cpu, msgs to the pager email address
----- Original Message ----- From: "ZanDAhaR" <allan at zandahar.net> To: <hobbit at hswn.dk> Sent: Friday, March 11, 2005 11:35 AM Subject: [hobbit] Excluding Hosts from Paging at certain times
Hi there,
Not sure if this is the right place to be sending this as I've only just subscribed and the emails dont quite say exactly where to email but here goes.
We have just upgraded from bb after 5 years or so to hobbit and we're not looking back. We have some servers that have to be rebooted on a daily basis due to some of the poorly written software (not by us) that runs on them. I have tred using the DOWNTIME= option on the BB hosts but it only seems to ignore the net tests like http, conn etc. The server still has the bb client installed on it so when the server comes back up the bb client starts before some of the monitored services start and therefore marks them red and we get paged on them.
Given our worldwide customer base we still want to know about the issues 24x7 except for the 10 mins or so when it reboots. BB used to have the option of putting in something like !host:time:email and it wouldnt page at that particular time for that hosts. Just wondering if someone can suggest the best way to work around this.
We have one alert that goes out via email and then to sortof a mailing list, and then another line that does email-to-pager. I was thinking something along the lines of what I have below although I am not sure about the TIME statement if you can have comma separated values, ,if you can pass midnight such as TIME=*:0910:0900
There is only 4 hosts currently that reboot, one at midnight, two at 9am, one at 8:30am
If someone could possible give me a hand it and point me in the right direction it would be greatly appreciated
Cheers
Allan
#extract of hobbit-alerts.cfg
$NPSERV=cpu,disk,msgs
HOST=host1.domain.com MAIL emailalret at domain.com COLOR=red,purple REPEAT=15 RECOVERED TIME=*:0010:0000 MAIL email at pageraddress.com COLOR=red,purple REPEAT=15 FORMAT=SMS EXSERVICE=$NPSERV TIME=*:0010:0000 STOP
HOST=host2.domain.com MAIL emailalret at domain.com COLOR=red,purple REPEAT=15 RECOVERED TIME=*:0000:0900,*:0910:0000 MAIL email at pageraddress.com COLOR=red,purple REPEAT=15 FORMAT=SMS EXSERVICE=$NPSERV TIME=*:0000:0900,*:0910:0000 STOP
HOST=* MAIL emailalret at domain.com REPEAT=15 COLOR=red,purple RECOVERED MAIL email at pageraddress.com FORMAT=SMS REPEAT=15 COLOR=red,purple EXSERVICE=$NPSERV
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Fri, 11 Mar 2005 12:37:39 +1100, ZanDAhaR <allan at zandahar.net> wrote:
I've already read all of the man pages online but I still couldnt be 100% sure about what was can and cant be specified in terms of times.
Also I needed clarification on what exactly the downtime param in the bb-hosts related to
The DOWNTIME issue is a bug. I just ran into it (msg posted on the 6th) where we reboot every Sunday, but it was only ignoring network-based tests. (have to assume it'll be fixed in the next RC or sometime before final)
-r
Ahh excellent as long as I know im not being stupid and going something wrong :)
Also I only just subscribed and could not find any sortof online version (is there one? if so where) of the mailing list to search so I would not have seen your earlier post
Cheers Allan
----- Original Message ----- From: "Robert Edeker" <idxman01 at gmail.com> To: <hobbit at hswn.dk> Sent: Friday, March 11, 2005 12:54 PM Subject: Re: [hobbit] Excluding Hosts from Paging at certain times
On Fri, 11 Mar 2005 12:37:39 +1100, ZanDAhaR <allan at zandahar.net> wrote:
I've already read all of the man pages online but I still couldnt be 100% sure about what was can and cant be specified in terms of times.
Also I needed clarification on what exactly the downtime param in the bb-hosts related to
The DOWNTIME issue is a bug. I just ran into it (msg posted on the 6th) where we reboot every Sunday, but it was only ignoring network-based tests. (have to assume it'll be fixed in the next RC or sometime before final)
-r
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Fri, Mar 11, 2005 at 11:35:40AM +1100, ZanDAhaR wrote:
We have just upgraded from bb after 5 years or so to hobbit and we're not looking back. We have some servers that have to be rebooted on a daily basis due to some of the poorly written software (not by us) that runs on them. I have tred using the DOWNTIME= option on the BB hosts but it only seems to ignore the net tests like http, conn etc. The server still has the bb client installed on it so when the server comes back up the bb client starts before some of the monitored services start and therefore marks them red and we get paged on them.
That doesn't sound right - the DOWNTIME setting should trigger on any test status that goes into Hobbit, whether it is a client-side or network-test.
I'll setup something similar here and see if I can reproduce this.
We have one alert that goes out via email and then to sortof a mailing list, and then another line that does email-to-pager. I was thinking something along the lines of what I have below although I am not sure about the TIME statement if you can have comma separated values, ,if you can pass midnight such as TIME=*:0910:0900
"Yes" to both questions. You can definitely have a list of comma-separated settings, and wrapping around midnight also works. You can try running "hobbitd_alert --test HOSTNAME TESTNAME" and see how it decides which alerts to send out.
Your configuration looks OK to me.
Regards, Henrik
participants (4)
-
allan@zandahar.net
-
henrik@hswn.dk
-
idxman01@gmail.com
-
kevin.grady@gmail.com