Looks like this was Devmon related.
I think you can see from this, where it started going wrong (1), where it broke (2), were I restarted Xymon (3) and where I restarted devmon (4)

I’m implementing a weekly devmon restart as this is not the first time I’ve had issue with excessive Load from devmon although it is the first time it completely broke Xymon.
Neil
Neil Simmonds Senior Platform & Middleware Engineer (Unix) |
![]() |
PHONE | 0344 245 9200 | ADDRESS | Unit A Brook Park East, NG20 8RY |
Credit is provided by Frasers Group Financial Services Limited (Registered Company no. 00718151), which is authorised and regulated by the Financial Conduct Authority (FRN 311908) for consumer credit and general insurance and a member of the Finance and Leasing Association. Frasers Group Financial Services Limited is registered in England and its registered office is: Express House, Petre Road, Clayton Business Park, Accrington, Lancashire, BB5 5JB. Studio is a trading name used by both Frasers Group Financial Services Limited and Studio Retail Trading Limited. For regulated payment services, Frasers Group Financial Services Limited is a payment agent of Transact Payments Limited, a company authorised and regulated by the Gibraltar Financial Services Commission as an electronic money institution. Transact Payments Limited, a company incorporated in Gibraltar (No 108217). Registered address 6.20 World Trade Center, 6 Bayside Road, Gibraltar, GX11 1AA. |
From: Neil Simmonds via Xymon <xymon@xymon.com>
Sent: Monday, October 28, 2024 3:01 PM
To: Jeremy Laidman <jeremy@laidman.org>; Xymon mailinglist <xymon@xymon.com>
Cc: Neil Simmonds <Neil.Simmonds@frasers.group>
Subject: [Xymon] Re: Serious issue with Xymon hanging
Hi Jeremy,
Answers below,
Cheers,
Neil Simmonds
|
Senior Platform & Middleware Engineer (Unix) |
|
|
|
PHONE |
0344 245 9200 |
ADDRESS |
Unit A Brook Park East, NG20 8RY |
|
Credit is provided by Frasers Group Financial Services Limited (Registered Company no. 00718151), which is authorised and regulated by the Financial Conduct Authority (FRN 311908) for
consumer credit and general insurance and a member of the Finance and Leasing Association. Frasers Group Financial Services Limited is registered in England and its registered office is: Express House, Petre Road, Clayton Business Park, Accrington, Lancashire,
BB5 5JB. Studio is a trading name used by both Frasers Group Financial Services Limited and Studio Retail Trading Limited. For regulated payment services, Frasers Group Financial Services Limited is a payment agent of Transact Payments Limited, a company authorised
and regulated by the Gibraltar Financial Services Commission as an electronic money institution. Transact Payments Limited, a company incorporated in Gibraltar (No 108217). Registered address 6.20 World Trade Center, 6 Bayside Road, Gibraltar, GX11 1AA. |
From: Jeremy Laidman <jeremy@laidman.org>
Sent: Monday, October 28, 2024 2:51 PM
To: Xymon mailinglist <xymon@xymon.com>
Cc: Neil Simmonds <Neil.Simmonds@frasers.group>
Subject: Re: [Xymon] Serious issue with Xymon hanging
On Mon, 28 Oct 2024 at 23:12, Neil Simmonds via Xymon <xymon@xymon.com> wrote:
Hi all,
This morning our Xymon system (Xymon 4.3.30-1.el8.terabithia on RHEL 8.6) was showing the main screens but when clicking on any status to see the data I was getting “no such host”. I restarted Xymon and it all came back but all pages completely reloaded. We’re missing RRD data for almost 24 hours.
Neil, that's quite a nasty incident. I'm hoping you had no un-detected faults due to Xymon being inoperable. – Yes, we did but thankfully only one that was important and it’s been resolved now, it just caused some extra work to resolve.
at 08:30 yesterday we started getting the following in xymongen.log which carried on every minute until I restarted it at 08:25 this morning.
2024-10-27 08:30:30.501407 xymond status-board not available, code 0
2024-10-27 08:30:30.501489 Failed to load current Xymon status, aborting page-update
There's nothing in xymonlaunch.log or xymonnet.log but at that time we got the following in xymond.log (actual servernames have been changed to “servername”
2024-10-27 08:30:13.048311 WARNING: Cannot open directory /etc/xymon/hosts.d
2024-10-27 08:30:13.048338 WARNING: Cannot open directory /etc/xymon/v9hosts.d
2024-10-27 08:30:13.048344 WARNING: Cannot open directory /etc/xymon/dynamicHosts.d
Do these three directories exist, and have files in them? – Yes, all 3 directories exist and have files in them and are all owned by the xymon user
2024-10-27 08:30:13 Flushing filecache
2024-10-27 08:30:13 Rescanning host tree
2024-10-27 08:30:13 Sending dropstate (from xymond) with servername
2024-10-27 08:30:13 Sending dropstate (from xymond) with servername
<snip>
2024-10-27 08:30:13 Sending dropstate (from xymond) with servername etc, etc …..
presumably these servernames were all from different entries from your hosts.cfg file(s)? - yes
Then we repeatedly got
2024-10-27 08:35:13 Reloading hostnames
2024-10-27 08:35:13 Flushing filecache
2024-10-27 08:35:13 Reloading client config
2024-10-27 08:35:58.233008 Bogus message from 10.105.3.100: Invalid new hostname ‘servername'
2024-10-27 08:35:58.233102 Bogus message from 10.105.3.100: Invalid new hostname ' servername'
again, servername is for various _different_ servers? And did each one have a leading space inside the quotes, or is that perhaps a by-product of your sanitising? – The space is a by-product, these were just the same 2 devices repeatedly
and those devices are monitored via devmon
The "Bogus message...Invalid new hostname" message comes from xymond when it receives a hostname it doesn't recognise (a ghost) AND when that hostname contains a character that's not in the set [a-zA-Z0-9:,._-]. This is consistent with
a leading space in the hostname, as if misconfigured on the client, although if you suddenly get lots of these from different hosts, then it seems more likely that the bad hostname was corrupted within xymond.
2024-10-27 08:36:00 Saving checkpoint file
2024-10-27 08:36:38 Generating stats
2024-10-27 08:36:38.074722 xymond servername MACHINE='ukawsmon01' not listed in hosts.cfg, dropping xymond status
2024-10-27 08:36:57.044131 Bogus message from 10.105.3.100: Invalid new hostname ' servername’
2024-10-27 08:36:57.044237 Bogus message from 10.105.3.100: Invalid new hostname ' servername’
2024-10-27 08:37:56.564893 Bogus message from 10.105.3.100: Invalid new hostname ' servername’
2024-10-27 08:37:56.564989 Bogus message from 10.105.3.100: Invalid new hostname ' servername’
2024-10-27 08:39:02.010103 Bogus message from 10.105.3.100: Invalid new hostname ' servername’
2024-10-27 08:39:02.010177 Bogus message from 10.105.3.100: Invalid new hostname ' servername’
2024-10-27 08:40:00.879305 Bogus message from 10.105.3.100: Invalid new hostname ' servername’
2024-10-27 08:40:00.879360 Bogus message from 10.105.3.100: Invalid new hostname ' servername’
Has anyone ever seen this before?
I haven't.
But my best guess is that xymond tried to re-read its hosts.cfg file(s) for some reason, but wasn't able to. If xymond knows no hosts, then xymonnet won't probe any hosts, and client messages will be dropped as ghost messages. Could this
have been a filesystem error, causing corruption, and preventing access to the file? That doesn't explain why it all started working after a reload, however. Could this have been the result of someone accidentally altering the permissions of hosts.cfg (or
the directory containing int) so xymond couldn't read it?
Nobody should have been accessing it on a Sunday morning but I must admit I did wonder if we’d had a space issue or something. Not at all sure and wasn’t able to find any evidence that we had
Cheers
Jeremy