How is clock graph in "trends" column generated
Thank you Jeremy for your suggestion!
I have run this command on the client, but I don't know what conclusions can I draw from it. Here is the outptu, (after being dropped to xymoncmd):
==================================================================
time /usr/libexec/xymon-client/xymon 10.12.12.44 "client/timetest
zm1.i2cinc.com.Linux" ignore mail.info file:/etc/passwd:md5 file:/etc/shadow:md5 log:/var/adm/messages:10240 0.00user 0.00system 0:00.00elapsed 33%CPU (0avgtext+0avgdata 680maxresident)k 0inputs+0outputs (0major+200minor)pagefaults 0swaps
BTW I do see clock offsets as big as 45 seconds reported on the "CPU" page for this client.
Thanks.
On Wed, Jun 29, 2016 at 4:46 AM, Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:
On Tue, 28 Jun 2016, 23:27 Junaid Shahid <shahid.junaid at gmail.com> wrote:
Thanks JC!
Now that makes it very clear how CPU stats contain server's timestamp (and why).
I have checked we are running version 4.3.21.
Now lets look at the reasons of skew: a) your xymon server itself is wrong Our server's time is correct (as I have manually checked it multiple times manually and also with "ntpstats"). Plus, we have some 300+ clients under Xymon monitoring, and none of them exhibit any time skew in their CLOCK Offset trends
b) you have a xymonproxy in the middle and messages are delayed getting to xymond We don't use any xymon proxy
c) your xymond_client process is backlogged with [client] messages This also can't be the reason because all other clients don't exhibit any noticeable skew in their respective Clock Offset trends
d) your xymon server is overloaded and has a long period between transmission and TCP processing by xymond This also must not be the case as no other client show any noticeable Clock Offset trend.
In our case there is one specific server (out of 300+) that has a clock offset trend that alternates b/w 2-15 secs (like a sinusoidal wave). This machine's time is in perfect sync with our NTP server though (no clock drift exists actually). This machine has a little complicated network topology though (behind various layers such as firewalls, load balancers etc). My only guess now is that this is because of its weird network location, what do you think JC?
I tend to agree. If it takes a few seconds to make a TCP connection to the xymon server and transmit the client message, you will see such a delay.
Try manually sending a client message and see how long it takes. Something like:
$ time $XYMON $XYMSRV "client/timetest $MACHINE.$SERVEROSTYPE"
(run within a xymoncmd shell on the client)
J
-- Regards, Junaid Shahid, TODO:______
Well, that was rather quick: 0.0 seconds elapsed. So it doesn't look like the comms is causing your problem.
On Sat, 2 Jul 2016, 02:52 Junaid Shahid <shahid.junaid at gmail.com> wrote:
Thank you Jeremy for your suggestion!
I have run this command on the client, but I don't know what conclusions can I draw from it. Here is the outptu, (after being dropped to xymoncmd):
==================================================================
time /usr/libexec/xymon-client/xymon 10.12.12.44 "client/timetest
zm1.i2cinc.com.Linux" ignore mail.info file:/etc/passwd:md5 file:/etc/shadow:md5 log:/var/adm/messages:10240 0.00user 0.00system 0:00.00elapsed 33%CPU (0avgtext+0avgdata 680maxresident)k 0inputs+0outputs (0major+200minor)pagefaults 0swaps
BTW I do see clock offsets as big as 45 seconds reported on the "CPU" page for this client.
Thanks.
On Wed, Jun 29, 2016 at 4:46 AM, Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:
On Tue, 28 Jun 2016, 23:27 Junaid Shahid <shahid.junaid at gmail.com> wrote:
Thanks JC!
Now that makes it very clear how CPU stats contain server's timestamp (and why).
I have checked we are running version 4.3.21.
Now lets look at the reasons of skew: a) your xymon server itself is wrong Our server's time is correct (as I have manually checked it multiple times manually and also with "ntpstats"). Plus, we have some 300+ clients under Xymon monitoring, and none of them exhibit any time skew in their CLOCK Offset trends
b) you have a xymonproxy in the middle and messages are delayed getting to xymond We don't use any xymon proxy
c) your xymond_client process is backlogged with [client] messages This also can't be the reason because all other clients don't exhibit any noticeable skew in their respective Clock Offset trends
d) your xymon server is overloaded and has a long period between transmission and TCP processing by xymond This also must not be the case as no other client show any noticeable Clock Offset trend.
In our case there is one specific server (out of 300+) that has a clock offset trend that alternates b/w 2-15 secs (like a sinusoidal wave). This machine's time is in perfect sync with our NTP server though (no clock drift exists actually). This machine has a little complicated network topology though (behind various layers such as firewalls, load balancers etc). My only guess now is that this is because of its weird network location, what do you think JC?
I tend to agree. If it takes a few seconds to make a TCP connection to the xymon server and transmit the client message, you will see such a delay.
Try manually sending a client message and see how long it takes. Something like:
$ time $XYMON $XYMSRV "client/timetest $MACHINE.$SERVEROSTYPE"
(run within a xymoncmd shell on the client)
J
-- Regards, Junaid Shahid, TODO:______
On Sat, Jul 2, 2016 at 7:36 AM Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:
On Sat, 2 Jul 2016, 02:52 Junaid Shahid <shahid.junaid at gmail.com> wrote:
Thank you Jeremy for your suggestion!
I have run this command on the client, but I don't know what conclusions can I draw from it. Here is the outptu, (after being dropped to xymoncmd):
0.00user 0.00system 0:00.00elapsed 33%CPU (0avgtext+0avgdata
680maxresident)k 0inputs+0outputs (0major+200minor)pagefaults 0swaps
OK, so what next? The likely causes seem to have been eliminated, or at least unlikely.
What I'd do next is to get a packet capture on both endpoints, of the client message, complete with timestamps on the capture output, when there's a significant time discrepancy; 45 seconds would be great, but anything more than a few seconds would be sufficient (an order of magnitude longer than the expected runtime of the xymonclient.sh script). Then I'd compare the capture timestamps with the time shown in the contents of the client message. This should hint at whether the time anomaly (sounds like a scifi plot device) is at the client or server.
For extra points, trace the client side using strace/truss with timestamps enabled.
Note that you don't need to wait (up to 5 minutes) for the client message to be transmitted. You can send a client message any time you like by running xymonclient.sh within xymoncmd. This also makes it easier to use strace/truss.
J
participants (2)
-
jlaidman@rebel-it.com.au
-
shahid.junaid@gmail.com