On Mon, February 1, 2016 4:59 pm, John Thurston wrote:
On 2/1/2016 2:41 PM, J.C. Cleaver wrote:
Hi,
Actually, I think I must have missed your final response on this at http://lists.xymon.com/pipermail/xymon/2015-December/042787.html ; my apologies.
On what's happening, I think this might be a side-effect of https://sourceforge.net/p/xymon/code/7651/ , which added a dummy record for the purposes of command-line --test functionality when the host doesn't exist. For an incoming unknown host (from xymond_alert's perspective), the same path is being executed.
I've applied the patch to my non-production server and performed my failure-reproduction steps. The behavior is certainly better. The alert process is no longer tanking for every message received :)
What I do get, for a newly added host, is "Checking criteria for host 'foo.bar.com', which is not defined. Will not alert until hostlist reload." This happens following all subsequent runs of xymonnet.
Is there anything which will trigger a hostlist reload?
Is there a tidy way to manually reload the list?
It doesn't seem to happen until I kill the "xymond_channel --channel=page" process. This seems like a hamfisted thing to do after every edit of hosts.cfg :(
Related question:
If this is in main code, and not some odd-ball null/EOF/posix problem (as has often tripped up my Solaris systems in the recent past), why am I the only one seeing this failure? Why aren't the folks running linux having their alerts fail?
This one took me quite a while to figure out, mainly because I was looking at the wrong code base for a while. It turns out the host info record here is *only* used for display groups and holiday lookups (probably rarely used), within the context of alerting. In all other cases, it not being in the hostlist doesn't impact the application of alert rule, since all the needed info is coming in via the '@@page' message itself. The patch should be updated to let those come straight through instead of exiting out if it doesn't see it. My confusion came from different issue: xymond_alert actually never reloads the hosts config at all! I found/fixed this back in Sept '14 in the RPMs but it wasn't applied into 4.3 back then. I'd been living with that code for so long I forgot that that reload wasn't needed here -- and, obviously, alerts have been working *in general*... (We only noticed the lack of reload because we were dependent on a dynamic value in the hosts.cfg line coming through to the alert script via XMH_RAW in updated form.) xymond_alert reloading was put into 4.4 at https://sourceforge.net/p/xymon/code/7776/ among the patch bursts, but the live host add issue has probably been in since this release. There are a few takeaways from this... but this needs to be fixed in 4.3 (among several other incoming issues that are pending confirmation). Can you please check the included two patches? One is an update for the previous one, which passes the alert check through (only adding the dummy record in --test mode to begin with), the other adds hosts.cfg reloading on intervals or on demand. It's based on the 4.4 version, but with only a small change. I'd like to add both, as I can't see any drawback to reloading hosts.cfg from xymond_alert's perspective, but the first may be sufficient to get back to the status quo. Regards, -jc