Is there a way to "quietly" disable hosts that have NOTICE set?
I have the NOTICE flag set for all of my production hosts - I want pages to go out if someone disables or enables any of them.
However, when backups are done, the oracle databases are brought down, which triggers an alert. If they are manually disabled, the NOTICE message goes out which also wakes people up for no reason.
If I use DOWNTIME in bb-hosts, then I have to specify a window which is guaranteed to be longer than the possible time it could take to backup the databases (which is a dynamic thing which will surely be wrong from time to time). So what ends up happening is for example, I would specify an hour of DOWNTIME, but the backups sometimes only take 30 minutes. That means there is a 30 minute window where a real alert would be masked, which is unacceptable in a production environment.
I guess what I'm looking for, is a way that I can send a commands to Hobbit via a shellscript (called from the db backup script), that would put a host/services in maint mode (disabled - blue dot), and NOT send a NOTICE page.
-Charles
On Monday 25 September 2006 07:56, Charles Jones wrote:
I have the NOTICE flag set for all of my production hosts - I want pages to go out if someone disables or enables any of them.
However, when backups are done, the oracle databases are brought down, which triggers an alert. If they are manually disabled, the NOTICE message goes out which also wakes people up for no reason.
If I use DOWNTIME in bb-hosts, then I have to specify a window which is guaranteed to be longer than the possible time it could take to backup the databases (which is a dynamic thing which will surely be wrong from time to time). So what ends up happening is for example, I would specify an hour of DOWNTIME, but the backups sometimes only take 30 minutes. That means there is a 30 minute window where a real alert
for a test being disabled, during scheduled downtime,
would be masked, which is unacceptable in a production environment.
hmm, in our environment, it is acceptable for tests to be disabled for more than the time required for a change if it is within the scheduled downtime for the change.
Of course, the fact that a test was disabled would still be logged by hobbit.
I guess what I'm looking for, is a way that I can send a commands to Hobbit via a shellscript (called from the db backup script), that would put a host/services in maint mode (disabled - blue dot), and NOT send a NOTICE page.
If you set the TIME on your NOTICE alerts to avoid notifications during downtime, notifications of notify messages will not be sent. Of course, you could have a NOTICE alert that covers this time but does not page (to track anything that does occur).
BTW, it might help if you include some information on how your alerting is set up, eg. line from bb-hosts and any matching rules from hobbit-alerts.cfg.
Regards, Buchan
-- Buchan Milne ISP Systems Specialist B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)
How about something like this:
- create a "test" named something like "backup". No alerts should be configured for this test, you can use NOPROPRED or DOWNTIME to prevent it from showing up as red and causing anxiety.
- make your normal db tests "DEPEND" on this new test.
- have your backup scripts send a command to hobbit server setting it to red at the start and green at the finish, be sure to set the status duration to be long enough to prevent purples
This should cause your normal db tests to go clear during backups, but restore normal operation after the backups complete.
Thanks, Larry Barber
On 9/26/06, Buchan Milne <bgmilne at staff.telkomsa.net> wrote:
On Monday 25 September 2006 07:56, Charles Jones wrote:
I have the NOTICE flag set for all of my production hosts - I want pages to go out if someone disables or enables any of them.
However, when backups are done, the oracle databases are brought down, which triggers an alert. If they are manually disabled, the NOTICE message goes out which also wakes people up for no reason.
If I use DOWNTIME in bb-hosts, then I have to specify a window which is guaranteed to be longer than the possible time it could take to backup the databases (which is a dynamic thing which will surely be wrong from time to time). So what ends up happening is for example, I would specify an hour of DOWNTIME, but the backups sometimes only take 30 minutes. That means there is a 30 minute window where a real alert
for a test being disabled, during scheduled downtime,
would be masked, which is unacceptable in a production environment.
hmm, in our environment, it is acceptable for tests to be disabled for more than the time required for a change if it is within the scheduled downtime for the change.
Of course, the fact that a test was disabled would still be logged by hobbit.
I guess what I'm looking for, is a way that I can send a commands to Hobbit via a shellscript (called from the db backup script), that would put a host/services in maint mode (disabled - blue dot), and NOT send a NOTICE page.
If you set the TIME on your NOTICE alerts to avoid notifications during downtime, notifications of notify messages will not be sent. Of course, you could have a NOTICE alert that covers this time but does not page (to track anything that does occur).
BTW, it might help if you include some information on how your alerting is set up, eg. line from bb-hosts and any matching rules from hobbit-alerts.cfg.
Regards, Buchan
-- Buchan Milne ISP Systems Specialist B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)
participants (3)
-
bgmilne@staff.telkomsa.net
-
jonescr@cisco.com
-
lebarber@gmail.com