Hi Henrik,
Thanks for replying.
On 02/ 7/11 01:10 PM, Henrik Størner wrote:
In<4D4C0F83.8080204 at unil.ch> Dominique Frise<dominique.frise at unil.ch> writes:
What is the minimum time for the same alert status to stay up to be processed correctly by Xymon ?
I am not sure I understand the question - are you saying that Xymon does not generate the notifications you expect it to ?
Sort of...
We have SNMP trap handling configured (thanks Andy Farrior) but are not completely happy with how it handles the alerting. When a bad trap from a given host is received, an alert status is generated for Xymon (yellow or red). So far, so good. Then, before this status'validity is expired (before it turns purple), a periodic launch of a script will reset its color to green, thus generating a recovered message indenpendently of the real status of the service reported by the trap. Further more, while a <host>.trap status is in alert state, other bad traps from same host and of same level will not generate any alerts (igmored).
Here follow a description of what we are trying to implement in order to improve this hanlding:
- a bad <host>trap is detected.
- generate a yellow/red <host>.trap status for Xymon.
- after a short delay (ideally 1 sec.), generate a clear <host>.trap status for Xymon.
All traps status except those in alert state are periodically set to clear. The red/yellow -> clear transition should not generate a recovered message. This should be achieved by removing "clear" from "OKCOLORS" in xymonserver.cfg but this does not work without modifying xymond_alert.c. A good <host>.trap should generate a green message and thus a recovered message.
We know that a 100% handling of traps in Xymon is not possible because we are misusing a single status (trap) to report many others, but his scenario would allow:
- a better alerting of all bad traps from the same host and of same level.
- the recovered status is a real recover (the text of the trap explains what recovered)
The issue we have now is that we are missing some alerts. We enabled debug and tracing but due to the amount of alerts we get, it is extremely difficult to follow one single alert. We think this could be related how xymond_alerts handles bunches of messages (10 sec.handling).
Can you please confirm ?
Thanks for your time.
Dominique
For example in following transitions, what would the minimum time (in sec.) for the yellow statuses (same check) to be processed correctly by Xymon ?
long t. short t. long t. short t. long t. long t. green -> yellow -> clear -> yellow -> clear -> green alert alert recovered
Provided you have alerts setup on a yellow status, and there is not a DURATION parameter that delays the alert, then you should get an alert on each of the transitions to yellow.
("clear" is not an alerting color - only yellow, red and purple are).
The only "minimum time" Xymon has in relation to alerts, is the DURATION parameter that you specify in alerts.cfg (hobbit-alerts.cfg in older versions).
Regards, Henrik
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com