On Tue, 29 Mar 2011 22:54:25 -0400, Elizabeth Schwartz <betsy.schwartz at gmail.com> wrote:
First, let me say that this is very nifty. Flap detection makes folks look at things that they might have missed.
Glad you like it:-)
It's driving the NOC folks **nuts** though. Acking the reds should stop them from paging, but the main page then stays red for a full half hour, even though the problem is completely fixed. IMHO it would be very useful to have a "release" or "ALL CLEAR" button of some sort for flapping situations that have been dealt with. The NOC folks hate red screens...
Well ... yes, I see your point but I am not sure I agree with it.
If your NOC folks are using the "critical view", then they can ack the alert, and it's gone from their view. That is how I think it/they should work :-)
I know a lot of sites use the "All non-green" view or even the full overview pages for monitoring, and the ack won't change the color there. If you must have a green display in that case, then you can disable the status (make it "blue") for 30 minutes, and then it will return to the real status after that half hour has passed. But of course, any errors during that period will not show up until the disable-period expires.
There may be a third possibility that does what you're asking for. I think (haven't tested it) that the new "modify" command would override a flapping status. If you have a "disk" status on the "server1" host, then a command like this
xymon 127.0.0.1 "modify server1.disk green manual Disk cleanup completed"
will override the normal status-color and force the status green with the comment "Disk cleanup completed". The "manual" keyword is just a token to identify this modification. However, a modification is only valid for 2 status-updates, so it won't handle the full 30-minute period. It wouldn't be terribly difficult to modify xymond to allow modifiers to be valid for a longer period of time.
This could easily be wrapped into the status display when a flapping status is shown.
Regards, Henrik