Hi Jeremy,
thanks a lot for your outside view. Maybe I was too focussed on the integer underrun to not see potential other issues...
On Thu, Aug 13, 2020 at 01:59:50PM +1000, Jeremy Laidman wrote:
On Thu, 13 Aug 2020 at 13:18, Axel Beckert <abe at deuxchevaux.org> wrote:
at least in Xymon 4.2.28 (plus recent CVE patches, i.e. as currently in Debian Stable) there is an issue with an integer underflow in the FILE mtime/ctime/atime check which basically seems to happen when finding the file to check takes a moment (seconds) and the file in question has been modified after the search for the file has been started.
Is it possible that the time on the host being monitored is in the future, at least compared to the time of the Xymon server?
Hmmmm, you've got a point there. I expected all our servers to have NTP, but this server in question was setup manually by a coworker and indeed has the wrong time ? but it's like 15 seconds behind the Xymon server (which definitely has NTP and I also compared it with two other machines ? then again the box in question has an ntp installed, but it wasn't running. I should probably monitor this. ;-)
Client with timestamp issues: Thu Aug 13 11:53:03 CEST 2020 Xymon Server: Thu 13 Aug 2020 09:53:18 AM UTC
I assume the different time zones (CEST being UTC+02:00) are no issue as we have quite some clients on CEST.
Here's again the same difference just in Unix Epoch:
Client: 1597312461 Server: 1597312477
Always about 15 to 16 seconds difference, but the client is lagging behind.
Can you provide the [clock] and [file:/nfs/...] sections of your client data message for an instance when the underflow has occurred?
Yep:
[file:/nf/2020/08/12/nfcapd.202008122015] type:100000 (file) mode:644 (-rw-r--r--) linkcount:1 owner:56137 (nfsen) group:48 (apache) size:232673177 clock:1597256407 (2020/08/12-20:20:07) atime:1597256114 (2020/08/12-20:15:14) ctime:1597256414 (2020/08/12-20:20:14) mtime:1597256414 (2020/08/12-20:20:14)
[?]
[clock] epoch: 1597256407.516640 local: 2020-08-12 20:20:07 CEST UTC: 2020-08-12 18:20:07 GMT
(BTW, is it normal that the [clientversion] section is empty?)
It's hard to see how a delay in the collection of file timestamps could cause this underflow, as the creation of the [clock] section (the source of "now") is executed after the creation of the [file:] section (the source of the MTIME value).
Good point. Solely from the facts above I'd expected that it is evaluated (but not printed) first.
Then again, due to that empty [clientversion] section, I checked the actually installed xymon-client version and it's a horribly old client version (4.3.10). (It seems as if I really should get rid of manually installed servers or align those boxes even more with Ansible. :-)
So forget this if this is solely caused by the ancient client. (And sorry for the noise because I just gave the server version and didn't even think to check the client version.)
However, If the [clock] section of the client message does not exist, the the Xymon server will use its own time for its calculations.
It does exist.
In such cases this integer underflow happens and causes a falso positive due to instead of the time difference being negative, it's insanely huge:
No matter the cause, Xymon should take into account the possibility that the timestamp is in the future, and at least show "was modified N seconds into the future" or something similar, after handling or avoiding the underflow.
Ack. Sometimes files copied from other machines with time in the future could be copied onto (or mounted on) a system with proper time. Maybe there are even applications where future timestamps make sense, i.e. if they're being misused for storing some date or so, who knows.
Perhaps create the following entry in clientlocal.cfg:
file:
exec >/tmp/clock-test 2>&1; date; $XYMONHOME/bin/logfetch --clock ; ls -l $(ls -1d /nf/2???/??/??/ | tail -1)* | tail -1; $XYMONHOME/bin/logfetch --clock; dateThe output file /tmp/clock-test might give you some idea of what's going on. Remember that it can take up to 10 minutes for updates to clientlocal.cfg to take effect on the client.
Will do in case the other facts above do not suffice already.
Kind regards, Axel
-- PGP: 2FF9CD59612616B5 /~\ Plain Text Ribbon Campaign, http://arc.pasp.de/ Mail: abe at deuxchevaux.org \ / Say No to HTML in E-Mail and Usenet Mail+Jabber: abe at noone.org X https://axel.beckert.ch/ / \ I love long mails: https://email.is-not-s.ms/