We recently set up automated ReaR/fsarchiver backups on our systems, and on some of our machines running older OSes this created havoc as something ("fsarchiver5", we think) ran roughshod over the disk and deleted thousands of files instead of archiving/backing them up.
This caused "xymonlaunch" to crash on a few of them. Despite having restored the missing files from tape backups, repeated re-launchings of the "xymon-client" service and even a reinstall/upgrade (to 4.3.28) we're still getting these purple alerts. (It feels like Xymon is seeing some file lying around and deciding that "xymonlaunch" is still in crashed status because of it.)
How do I fix this? I suppose I could do 'xymon 127.0.0.1 "drop myhost xymond"' on my monitoring host but that just doesn't feel right ...
Thanks,
- Greg
Forwarded message:
From: xymon Monitor <xymon at monitor.my.do.main> To: root at xymonmonitor.my.do.main Subject: Xymon [0] myhost:xymond stopped reporting (PURPLE) Date: Sat, 30 Dec 2017 15:37:04 -0800
red (Check time of report) - xymonlaunch program crashed
Fatal signal caught!
See http://myhost/xymon-cgi/svcstatus.sh?HOST=myhost&SERVICE=xymond