My Xymon (Xymon 4.3.30-1.el7.terabithia) is no longer noticing it is time to stop sending email alerts.
A customer will ping me, saying "I'm still getting emails for a problem I fixed 10 days ago!"
I find the messages in question in the /notifications.log/ Yep, there are a lot of them. I can see the test recovered ages ago, and there should no longer be notifications.
If I go look in /alert.chk/, I can see the host:test in question
If I restart xymon, the /alert.log/ will get a bunch of lines "Stale alert found", but the lines remain in the /alert.chk/
The only way I have figured out to clean this up is to grep the 'Stale' host:test pairs out of the /alert.log/, stop xymon, feed the host:test pairs through sed to delete the offending lines from /alert.chk/, and restart xymon.
Anyone have any ideas what's wrong here?
--
Do things because you should, not just because you can.
John Thurston 907-465-8591 John.Thurston at alaska.gov Department of Administration State of Alaska