Hi,
at least in Xymon 4.2.28 (plus recent CVE patches, i.e. as currently in Debian Stable) there is an issue with an integer underflow in the FILE mtime/ctime/atime check which basically seems to happen when finding the file to check takes a moment (seconds) and the file in question has been modified after the search for the file has been started.
In such cases this integer underflow happens and causes a falso positive due to instead of the time difference being negative, it's insanely huge:
[red] /nf/2020/08/12/nfcapd.202008122015 File was modified 4294967289 seconds ago - should be <360 File was modified 4294967289 seconds ago - should be <960
So far for me this only happened, when
client-local.cfg contains
...commands to find the file, in our example it isfile:
ls -1 $(ls -1d /nf/2???/??/??/ | tail -1)* | tail -1to find the newest file under /nf/20??/ by first looking for the most recent directory and then finding the most recent file (both by just sorting them alphabetically) as listing all files would take much too long, hence the nested
... $() ....The corresponding entries in analysis.cfg are
FILE %/nf/[0-9]*/[0-9]*/[0-9]*/nfcapd.[0-9]* MTIME<360 SIZE>9000000 yellow FILE %/nf/[0-9]*/[0-9]*/[0-9]*/nfcapd.[0-9]* MTIME<960 SIZE>1500 red
On a machine with slow I/O, in our case on a not that strong VM used for testing.
I tried to reproduce this on a Raspberry Pi with a much simpler setup and Xymon 4.3.30, but failed so far and don't know, why:
client-local.cfg entry:
file:
sleep 90; echo /run/xymon-integer-underflow-buganalysis.cfg entries:
FILE /run/xymon-integer-underflow-bug red mtime<61 FILE /run/xymon-integer-underflow-bug yellow mtime>0
I then setup a cron job, touching the file every minute. Hence the 61 in the checks above.
Unfortunately this didn't suffice to trigger this issue. It is unclear to me why as the only explanation I have is the additional time needed to find the appropriate file to test. And the relevant code doesn't seem to have changed between 4.3.28 and 4.3.30...
So I cheated a bit and setup the cron job to always set the file stamp to one hour in the future using "touch -d 'next hour' . Then I at least could trigger that integer underflow:
red /run/xymon-integer-underflow-bug File was modified 4294963714 seconds ago - should be <61
I think there are several ways to fix this:
Move capturing the "now" timestamp just before checking the files timestamps, not before finding that file. Haven't noticed where the actual stat call is, though.
Use an signed instead of an unsigned integer to properly handle negative time differences. Tried to implement this as a patch as it seemed easy enough.
So far I managed to get this far:
[red] /run/xymon-integer-underflow-bug File was modified -3592 seconds ago - should be <61
So I guess that patch still needs more finetuning. Will send it once I got it fully working as I expect. But if there's interest in the incomplete patch already, I'm happy to send it anyways. (Respectively it is already online at https://salsa.debian.org/debian/xymon/-/blob/files-integer-undeflow/debian/p...)
Kind regards, Axel-- PGP: 2FF9CD59612616B5 /~\ Plain Text Ribbon Campaign, http://arc.pasp.de/ Mail: abe at deuxchevaux.org \ / Say No to HTML in E-Mail and Usenet Mail+Jabber: abe at noone.org X https://axel.beckert.ch/ / \ I love long mails: https://email.is-not-s.ms/
On Thu, 13 Aug 2020 at 13:18, Axel Beckert <abe at deuxchevaux.org> wrote:
Hi,
at least in Xymon 4.2.28 (plus recent CVE patches, i.e. as currently in Debian Stable) there is an issue with an integer underflow in the FILE mtime/ctime/atime check which basically seems to happen when finding the file to check takes a moment (seconds) and the file in question has been modified after the search for the file has been started.
Is it possible that the time on the host being monitored is in the future, at least compared to the time of the Xymon server?
Can you provide the [clock] and [file:/nfs/...] sections of your client data message for an instance when the underflow has occurred?
It's hard to see how a delay in the collection of file timestamps could cause this underflow, as the creation of the [clock] section (the source of "now") is executed after the creation of the [file:] section (the source of the MTIME value). Essentially it's:
- create tempfile
- echo "client" header >> tempfile
- run OS script >> tempfile
- run "logfetch" to generate [file:] sections >> tempfile
- echo client version section >> tempfile
- run "logfetch --clock" to gelerate [clock] section >> tempfile
- send tempfile to Xymon server
What you're describing is a situation where the time generated at step 4 is ahead of the time generated at step 6. I think this can only happen if the timestamp of the file is in the future.
However, If the [clock] section of the client message does not exist, the the Xymon server will use its own time for its calculations.
In such cases this integer underflow happens and causes a falso
positive due to instead of the time difference being negative, it's insanely huge:
No matter the cause, Xymon should take into account the possibility that the timestamp is in the future, and at least show "was modified N seconds into the future" or something similar, after handling or avoiding the underflow.
Perhaps create the following entry in clientlocal.cfg:
file:exec >/tmp/clock-test 2>&1; date; $XYMONHOME/bin/logfetch --clock ; ls -l $(ls -1d /nf/2???/??/??/ | tail -1)* | tail -1; $XYMONHOME/bin/logfetch --clock; date
The output file /tmp/clock-test might give you some idea of what's going on. Remember that it can take up to 10 minutes for updates to clientlocal.cfg to take effect on the client.
Cheers Jeremy
Hi Jeremy,
thanks a lot for your outside view. Maybe I was too focussed on the integer underrun to not see potential other issues...
On Thu, Aug 13, 2020 at 01:59:50PM +1000, Jeremy Laidman wrote:
On Thu, 13 Aug 2020 at 13:18, Axel Beckert <abe at deuxchevaux.org> wrote:
at least in Xymon 4.2.28 (plus recent CVE patches, i.e. as currently in Debian Stable) there is an issue with an integer underflow in the FILE mtime/ctime/atime check which basically seems to happen when finding the file to check takes a moment (seconds) and the file in question has been modified after the search for the file has been started.
Is it possible that the time on the host being monitored is in the future, at least compared to the time of the Xymon server?
Hmmmm, you've got a point there. I expected all our servers to have NTP, but this server in question was setup manually by a coworker and indeed has the wrong time ? but it's like 15 seconds behind the Xymon server (which definitely has NTP and I also compared it with two other machines ? then again the box in question has an ntp installed, but it wasn't running. I should probably monitor this. ;-)
Client with timestamp issues: Thu Aug 13 11:53:03 CEST 2020 Xymon Server: Thu 13 Aug 2020 09:53:18 AM UTC
I assume the different time zones (CEST being UTC+02:00) are no issue as we have quite some clients on CEST.
Here's again the same difference just in Unix Epoch:
Client: 1597312461 Server: 1597312477
Always about 15 to 16 seconds difference, but the client is lagging behind.
Can you provide the [clock] and [file:/nfs/...] sections of your client data message for an instance when the underflow has occurred?
Yep:
[file:/nf/2020/08/12/nfcapd.202008122015] type:100000 (file) mode:644 (-rw-r--r--) linkcount:1 owner:56137 (nfsen) group:48 (apache) size:232673177 clock:1597256407 (2020/08/12-20:20:07) atime:1597256114 (2020/08/12-20:15:14) ctime:1597256414 (2020/08/12-20:20:14) mtime:1597256414 (2020/08/12-20:20:14)
[?]
[clock] epoch: 1597256407.516640 local: 2020-08-12 20:20:07 CEST UTC: 2020-08-12 18:20:07 GMT
(BTW, is it normal that the [clientversion] section is empty?)
It's hard to see how a delay in the collection of file timestamps could cause this underflow, as the creation of the [clock] section (the source of "now") is executed after the creation of the [file:] section (the source of the MTIME value).
Good point. Solely from the facts above I'd expected that it is evaluated (but not printed) first.
Then again, due to that empty [clientversion] section, I checked the actually installed xymon-client version and it's a horribly old client version (4.3.10). (It seems as if I really should get rid of manually installed servers or align those boxes even more with Ansible. :-)
So forget this if this is solely caused by the ancient client. (And sorry for the noise because I just gave the server version and didn't even think to check the client version.)
However, If the [clock] section of the client message does not exist, the the Xymon server will use its own time for its calculations.
It does exist.
In such cases this integer underflow happens and causes a falso positive due to instead of the time difference being negative, it's insanely huge:
No matter the cause, Xymon should take into account the possibility that the timestamp is in the future, and at least show "was modified N seconds into the future" or something similar, after handling or avoiding the underflow.
Ack. Sometimes files copied from other machines with time in the future could be copied onto (or mounted on) a system with proper time. Maybe there are even applications where future timestamps make sense, i.e. if they're being misused for storing some date or so, who knows.
Perhaps create the following entry in clientlocal.cfg:
file:
exec >/tmp/clock-test 2>&1; date; $XYMONHOME/bin/logfetch --clock ; ls -l $(ls -1d /nf/2???/??/??/ | tail -1)* | tail -1; $XYMONHOME/bin/logfetch --clock; dateThe output file /tmp/clock-test might give you some idea of what's going on. Remember that it can take up to 10 minutes for updates to clientlocal.cfg to take effect on the client.
Will do in case the other facts above do not suffice already.
Kind regards, Axel
-- PGP: 2FF9CD59612616B5 /~\ Plain Text Ribbon Campaign, http://arc.pasp.de/ Mail: abe at deuxchevaux.org \ / Say No to HTML in E-Mail and Usenet Mail+Jabber: abe at noone.org X https://axel.beckert.ch/ / \ I love long mails: https://email.is-not-s.ms/
participants (2)
-
abe@deuxchevaux.org
-
jeremy@laidman.org