recovery emails on alerts which don't generate pages == undesired behaviour
Hi,
Previously, we've only wanted to alert on red and purples, and then send recovery emails when it changes out of a red or purple state. This was easily accomplished by setting --alertcolors to red,purple.
Recently however, I've gotten some requests from people who want to get alerted on a few monitors when they yellow. No problem, I thought. I'll just change --alertcolors back to default, and then add COLOR=red,purple to all the existing alert definitions to start.
This caused the problem where a monitor would go into a yellow state (not causing a page because COLOR=red,purple) and then go back to green (which would then send a recovery page). This current behaviour doesn't make sense. Why would I want to be alerted on a recovery of something that never generated an alert in the first place? It would make more sense if the recovery email condition was tied to the COLOR definition.
So an alert with COLOR=red,yellow,purple (the default) would send a recovery message on leaving the red, yellow, or purple state to green, blue, or clear.
And an alert with COLOR=red,purple would only send a recovery message if it was leaving a red or purple state.
Or maybe I'm braindead and can't see how this can be accomplished currently.
Comments?
-- Bruce Z. Lysik <blysik at shutterfly.com> Operations Engineer
On Thu, Mar 03, 2005 at 03:13:31PM -0800, Bruce Lysik wrote:
Recently however, I've gotten some requests from people who want to get alerted on a few monitors when they yellow. No problem, I thought. I'll just change --alertcolors back to default, and then add COLOR=red,purple to all the existing alert definitions to start.
This caused the problem where a monitor would go into a yellow state (not causing a page because COLOR=red,purple) and then go back to green (which would then send a recovery page). This current behaviour doesn't make sense.
The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I sent out yesterday to stop alerts from going off every minute - should behave the way you want. As you say, it doesn't make sense to get a recovery message when you didn't get the alert. I just tested it to be absolutely certain it behaves, and it does.
If you're confused about what version you've got (quite understandable): Unpack the hobbit-4.0-RC4 archive; grab the latest post-RC4 patch (I updated it yesterday) from http://www.hswn.dk/beta/ and apply it with "cd hobbit-4.0-RC4; patch -p0 </tmp/post-RC4.patch" copy over your Makefile from the old setup and run make, make install.
Henrik
On Fri, Mar 04, 2005 at 07:40:17AM, Henrik Stoerner wrote:
On Thu, Mar 03, 2005 at 03:13:31PM -0800, Bruce Lysik wrote:
The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I sent out yesterday to stop alerts from going off every minute - should behave the way you want. As you say, it doesn't make sense to get a recovery message when you didn't get the alert. I just tested it
Like to know how you test. It will add more debug skills, for hobbit, in my list :-)
to be absolutely certain it behaves, and it does.
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu "It is not the strongest of the species that survives, not the most intelligent, but the one most responsive to change." - Charles Darwin
On Fri, Mar 04, 2005 at 02:19:54PM -0500, Asif Iqbal wrote:
On Fri, Mar 04, 2005 at 07:40:17AM, Henrik Stoerner wrote:
On Thu, Mar 03, 2005 at 03:13:31PM -0800, Bruce Lysik wrote:
The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I sent out yesterday to stop alerts from going off every minute - should behave the way you want. As you say, it doesn't make sense to get a recovery message when you didn't get the alert. I just tested it
Like to know how you test. It will add more debug skills, for hobbit, in my list
I have Hobbit running on my workstation, just monitoring itself. Then I setup bb-hosts or hobbit-alerts.cfg as needed for the test I want to do; e.g here I added some extra alert rules:
HOST=osiris.hswn.dk MAIL henrik at hswn.dk REPEAT=1h COLOR=red RECOVERED MAIL henrik-yellow at hswn.dk REPEAT=1h COLOR=red,yellow RECOVERED
Restarted Hobbit, and fired off a yellow and a red alert:
bb 127.0.0.1 "status osiris,hswn,dk.test1 yellow date Test Y"
bb 127.0.0.1 "status osiris,hswn,dk.test1 red date Test R"
and noticed what emails were being sent - I should get one message for the yellow status, and two for the red. When I got these, repeat the "bb" commands with a green status, and see what recovery messages show up.
Henrik
participants (3)
-
blysik@shutterfly.com
-
henrik@hswn.dk
-
iqbala-hobbit@qwestip.net