DURATION rules for specific host alerts

gumby3203＠gmail.com

22 Jun 2007 22 Jun '07

2:49 p.m.

Is there a [non-messy] way to set a DURATION rule for a specific host alert? Basically, what I'm thinking of is something like this:

In hobbit-clients.cfg HOST=myhost LOAD 20 30 DURATION>5m

The effect being, the status of the "myhost" cpu alert will only change to yellow/red if the load is above the appropriate threshold for more than 5 minutes.

There are a few hosts that occasionally will spike above the cpu load thresholds, but only for a few minutes (usually around 5 min at most), and then recover on its own. However, I don't want to raise the thresholds, because a sustained load (more than 10 minutes) at this level _is_ actually a critical event. It's just not critical if it is just a momentary spike.

My specific example is with cpu load, but it could be for other things too, such as process counts, memory, or even in some situations, disk space.

Show replies by date

dbourque＠weatherdata.com

22 Jun 22 Jun

3:12 p.m.

New subject: [hobbit] DURATION rules for specific host alerts

Why would you not want the status to change ? Such a history log is great for troubleshooting.

if you don't want to be notified about it, just use this in the hobbit-alerts.cfg

Page=x IGNORE HOST=foo SERVICE=cpu COLOR=red DURATION<5m

if you don't want it to change the status color on the parent pages , then use NOPROPYELLOW:cpu in the bb-hosts file.

if you REALLY don't want it to change status, increase the LOAD numbers in the hobbit-clients.cfg file.

-Dan

Gary Baluha wrote:

...

Is there a [non-messy] way to set a DURATION rule for a specific host alert? Basically, what I'm thinking of is something like this:

In hobbit-clients.cfg HOST=myhost LOAD 20 30 DURATION>5m

The effect being, the status of the "myhost" cpu alert will only change to yellow/red if the load is above the appropriate threshold for more than 5 minutes.

There are a few hosts that occasionally will spike above the cpu load thresholds, but only for a few minutes (usually around 5 min at most), and then recover on its own. However, I don't want to raise the thresholds, because a sustained load (more than 10 minutes) at this level _is_ actually a critical event. It's just not critical if it is just a momentary spike.

My specific example is with cpu load, but it could be for other things too, such as process counts, memory, or even in some situations, disk space.

gumby3203＠gmail.com

5:36 p.m.

New subject: [hobbit] DURATION rules for specific host alerts

On 6/22/07, Daniel Bourque <dbourque at weatherdata.com> wrote:

...

Why would you not want the status to change ? Such a history log is great for troubleshooting.

I wouldn't want the status to change, because I'm essentially making it a two-part threshold; one part based on the hard-and-true numeric value, and another threshold based on the length of time.

if you don't want to be notified about it, just use this in the

...

hobbit-alerts.cfg

Page=x IGNORE HOST=foo SERVICE=cpu COLOR=red DURATION<5m

Ahh, that's the sort of hobbit-alerts rule that would work for me, at least until (if?) there becomes a way to do what I'm looking for in hobbit-clients.cfg.

if you don't want it to change the status color on the parent pages , then

...

use NOPROPYELLOW:cpu in the bb-hosts file.

if you REALLY don't want it to change status, increase the LOAD numbers in the hobbit-clients.cfg file.

The problem is that it is only a problem if the load is _sustained_ for more than 10 minutes or so. If I set the red threshold to Y, and the load momentarily spikes to Y+1, it isn't a problem. But if I raise the threshold to Y+2 and now I get a sustained load of Y+1, it would be a problem since I wouldn't get alerted.

Essentially, I'm looking for a sort of time-based hysteretic monitoring.

-Dan

...

Gary Baluha wrote:

Is there a [non-messy] way to set a DURATION rule for a specific host alert? Basically, what I'm thinking of is something like this:

In hobbit-clients.cfg HOST=myhost LOAD 20 30 DURATION>5m

The effect being, the status of the "myhost" cpu alert will only change to yellow/red if the load is above the appropriate threshold for more than 5 minutes.

There are a few hosts that occasionally will spike above the cpu load thresholds, but only for a few minutes (usually around 5 min at most), and then recover on its own. However, I don't want to raise the thresholds, because a sustained load (more than 10 minutes) at this level _is_ actually a critical event. It's just not critical if it is just a momentary spike.

My specific example is with cpu load, but it could be for other things too, such as process counts, memory, or even in some situations, disk space.

6942

Age (days ago)

6942

Last active (days ago)

List overview

Download

2 comments

2 participants

participants (2)

dbourque＠weatherdata.com
gumby3203＠gmail.com