[Feature request] Alerts should mention that the reason the colour is X is due to flapping
One of my colleagues queried yesterday why he got an alert when the text of the e-mail suggests the status is green and there were 0 failures in the past 10 minutes. Looking at the status now, I saw that it said:
WARNING: Flapping status
This must be the reason why the status colour was what it was when the alert was generated - the status was flapping. But this fact should be in the e-mail, just like it is in the display, or it just looks wrong. Alternatively, alerts should be suppressed for flapping statuses until it actually goes red (or whatever is configured in alerts.cfg) again. This would mean storing the current colour separely from the flapping-locked colour (which I expect is done anyway) and using the current colour to match the COLOR rule in alerts.cfg, and the flapping-locked colour to get the history of how long the alert has been in this colour for the purposes of matching against DURARATION and REPEAT rules. So for alert configured like this: MAIL $pg-opssms COLOR=red DURATION>10 REPEAT=30 RECOVERED that goes red for 5 minutes, yellow for 6, red for 5, yellow for 5, green for 5, yellow for 5, red for 30 minutes an alert should be genererated after 11 minutes and 41 minutes, not 10 and 40. This would mean the alert would detail the current error in the text of the alert, rather than the fact that everything is fine at that moment.
Kind regards,
SebA
participants (1)
-
spah@syntec.co.uk