newer
different memory reports across...

Gaps in graphs

older
Issues logging into Xymon after...

Carl.Melgaard＠STAB.RM.DK

4 Mar 2021 4 Mar '21

10:43 a.m.

Hi,

How serious is gaps in graphs - for instance disk-graphs etc. Is a gap the same as potential missing alerting on events?

Regards,

Carl Melgaard

Show replies by date

jeremy＠laidman.org

5 Mar 5 Mar

1:37 a.m.

On Thu, 4 Mar 2021 at 21:44, Carl Melgaard <Carl.Melgaard at stab.rm.dk> wrote:

...

Hi,

How serious is gaps in graphs ? for instance disk-graphs etc. Is a gap the same as potential missing alerting on events?

Regards,

Carl Melgaard

Yes, usually gaps in graphs are caused by missing data points. In the case of the disk graph, this is usually caused by missing client data messages that are not being sent from host to Xymon server, for some reason - such as stopping the Xymon client at just the wrong time. It's also possible that client data messages are not being sent in a timely manner - if two data points are fed into RRD within the same 5-minute interval, the second one is ignored, and then the next 5-minute interval with have no data point.

One unlikely cause of missing graphs is that the client data message is being truncated. If the disk stats are after the point of truncation, then there are no data points to add to the RRD file, so you'll see a gap. I would check your xymond.log file for messages like "Oversize data/client msg from 10.1.1.1 truncated n=<msgsize>, limit <msglimit>). If disk graphs are affected by a section earlier in the message, it's likely that other graphs are also affected by this - the [df] section is followed by [free] (memor), then [ifconfig] and all the other sections used for network stats. Perhaps scan down the graphs on the trends page looking for similar gaps.

I've seen client data message truncation cause missing data points, but, it's actually unlikely this is the cause of your problem. All of the client data sections that are likely to cause truncation are after the sections that are used for the standard graphs (including disk). But it couldn't hurt to check. Message limit defaults can be changed in the xymonserver.cfg file - search the man page for MAXMSG_CLIENT for more details.

If the cause is something else, I suspect you'll still find clues in your xymond.log file. But also check rrd-data.log and rrd-status.log.

Carl.Melgaard＠STAB.RM.DK

8:11 a.m.

On Thu, 4 Mar 2021 at 21:44, Carl Melgaard <Carl.Melgaard at stab.rm.dk<mailto:Carl.Melgaard at stab.rm.dk>> wrote: Hi,

How serious is gaps in graphs ? for instance disk-graphs etc. Is a gap the same as potential missing alerting on events?

Regards,

Carl Melgaard

...

Yes, usually gaps in graphs are caused by missing data points. In the case of the disk graph, this is usually caused by missing client data messages that are not being sent from host to Xymon server, for some reason - such as stopping the Xymon client at just the wrong time. It's also >possible that client data messages are not being sent in a timely manner - if two data points are fed into RRD within the same 5-minute interval, the second one is ignored, and then the next 5-minute interval with have no data point.

...

One unlikely cause of missing graphs is that the client data message is being truncated. If the disk stats are after the point of truncation, then there are no data points to add to the RRD file, so you'll see a gap. I would check your xymond.log file for messages like "Oversize data/client msg >from 10.1.1.1 truncated n=<msgsize>, limit <msglimit>). If disk graphs are affected by a section earlier in the message, it's likely that other graphs are also affected by this - the [df] section is followed by [free] (memor), then [ifconfig] and all the other sections used for network stats. >Perhaps scan down the graphs on the trends page looking for similar gaps.

...

I've seen client data message truncation cause missing data points, but, it's actually unlikely this is the cause of your problem. All of the client data sections that are likely to cause truncation are after the sections that are used for the standard graphs (including disk). But it couldn't hurt to >check. Message limit defaults can be changed in the xymonserver.cfg file - search the man page for MAXMSG_CLIENT for more details.

...

If the cause is something else, I suspect you'll still find clues in your xymond.log file. But also check rrd-data.log and rrd-status.log.

It is indeed wrong with more graphs than just disk. I?ll look around ? thanks for the pointers.

Regards,

Carl

Carl.Melgaard＠STAB.RM.DK

10:48 a.m.

On Thu, 4 Mar 2021 at 21:44, Carl Melgaard <Carl.Melgaard at stab.rm.dk<mailto:Carl.Melgaard at stab.rm.dk>> wrote: Hi, How serious is gaps in graphs ? for instance disk-graphs etc. Is a gap the same as potential missing alerting on events? Regards, Carl Melgaard

...

Yes, usually gaps in graphs are caused by missing data points. In the case of the disk graph, this is usually caused by missing client data messages that are not being sent from host to Xymon server, for some reason - such as stopping the Xymon client at just the wrong time. It's also >possible that client data messages are not being sent in a timely manner - if two data points are fed into RRD within the same 5-minute interval, the second one is ignored, and then the next 5-minute interval with have no data point. One unlikely cause of missing graphs is that the client data message is being truncated. If the disk stats are after the point of truncation, then there are no data points to add to the RRD file, so you'll see a gap. I would check your xymond.log file for messages like "Oversize data/client msg >from 10.1.1.1 truncated n=<msgsize>, limit <msglimit>). If disk graphs are affected by a section earlier in the message, it's likely that other graphs are also affected by this - the [df] section is followed by [free] (memor), then [ifconfig] and all the other sections used for network stats. >Perhaps scan down the graphs on the trends page looking for similar gaps. I've seen client data message truncation cause missing data points, but, it's actually unlikely this is the cause of your problem. All of the client data sections that are likely to cause truncation are after the sections that are used for the standard graphs (including disk). But it couldn't hurt to >check. Message limit defaults can be changed in the xymonserver.cfg file - search the man page for MAXMSG_CLIENT for more details. If the cause is something else, I suspect you'll still find clues in your xymond.log file. But also check rrd-data.log and rrd-status.log.

So, I looked through the logs. Xymond.log doesn?t point to anything besides the normal spam in there, except this one: Sending dropstate (from xymond) with xxx

But in the rrd-data.log and rrd-status.log I have this occurring (more than once):

rrd-data.log 2021-03-03 01:24:19.002264 xxx/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614731059, different data

rrd-status.log 2021-03-03 02:55:15.003177 xxx/disk,tmpfs.rrd: Bug - duplicate RRD data with same timestamp 1614736515, different data

I recently updated to newest version of Xymon (from a very old version), and it seems I carried over some MAXMSG-settings:

MAXMSG_STATUS="5180590" MAXMSG_CLIENT="5180590" MAXMSG_DATA="5180590" #MAXMSG_CLIENT=512 # clientdata messages (default=512k) #MAXMSG_STATUS=256 # general "status" messages (default=256k) #MAXMSG_DATA=256 # "data" messages, if enabled (default=256k)

And if Xymon now thinks numbers are in kilobyte instead of bytes, I seem to have allocated A LOT more memory perhaps?

Regards,

Carl Melgaard

Scot.Kreienkamp＠la-z-boy.com

3:04 p.m.

2021-03-03 01:24:19.002264 xxx/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614731059, different data

That usually happens because graphs by default store data once every 5 minutes. However, if it receives data more often, say every minute, then it can?t store that in the RRD because it can only store one data point every 5 minutes. Since it?s receiving data more often than once in the 5 minute window the RRD backend triggers that message.

Scot Kreienkamp | Senior Systems Engineer | La-Z-Boy Corporate One La-Z-Boy Drive | Monroe, Michigan 48162 | ? 734-384-6403 | | ? 1-734-915-1444 | ? Scot.Kreienkamp at la-z-boy.com www.la-z-boy.com<http://www.la-z-boy.com> | facebook.com/lazboy<http://facebook.com/lazboy> | twitter.com/lazboy<http://twitter.com/lazboy> | youtube.com/lazboy<http://youtube.com/lazboy> [cid:4C-lzbVertical_Tag_400px_d8b9412e-f3ea-46a1-99dc-a7c57261e11e.jpg]

This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.

jeremy＠laidman.org

6 Mar 6 Mar

7:31 a.m.

On Sat, 6 Mar 2021 at 02:04, Scot Kreienkamp <Scot.Kreienkamp at la-z-boy.com> wrote:

...

2021-03-03 01:24:19.002264 xxx/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614731059, different data

That usually happens because graphs by default store data once every 5 minutes. However, if it receives data more often, say every minute, then it can?t store that in the RRD because it can only store one data point every 5 minutes. Since it?s receiving data more often than once in the 5 minute window the RRD backend triggers that message.

Yes, what Scott said. This could mean you have two different sources of both client data messages and status messages, as if you have two copies of the Xymon client running on the host being monitored.

However, duplicate messages would, not in itself, lead to missing data. It would only cause the extra data to be dropped, but the first data point in a 5-minute window would be accepted, and no gaps in the data. The only way I can see this being a symptom of a problem that also causes gaps in graphs, is if the clock on the host is jittering wildly (such as if it was a VM on a heavily-loaded host server) and causing some sequential messages to arrive at the Xymon server too close together. This is quite unlikely, so I'm not sure this is related to the gaps. Instead, you might just have two problems to solve: duplicate data sources, and an as-yet unexplained gaps in your graphs.

Are you receiving these "duplicate RRD data" messages every 5 minutes, or only occasionally (such as when you're seeing gaps in your graphs)?

It might be helpful to see one of your graphs with gaps in it.

Also, can you provide maybe 10 sequential log messages with the "duplicate RRD data" in them? I'd like to get a sense of their regularity and frequency.

One last thing to look at. Are the gaps actual missing data points, or are they values of zero? The way to tell this is to dump the RRD file's contents using something like "rrdtool fetch netstat.rrd AVERAGE | tail -100" (or "less rather than tail -100) and look for either zero or low numbers, or NaN (not a number) entries. [Note that the last few are usually NaN because they're still waiting for updates, so you can ignore those.]

Cheers Jeremy

Carl.Melgaard＠STAB.RM.DK

8 Mar 8 Mar

8:13 a.m.

On Sat, 6 Mar 2021 at 02:04, Scot Kreienkamp <Scot.Kreienkamp at la-z-boy.com<mailto:Scot.Kreienkamp at la-z-boy.com>> wrote: 2021-03-03 01:24:19.002264 xxx/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614731059, different data

...

Are you receiving these "duplicate RRD data" messages every 5 minutes, or only occasionally (such as when you're seeing gaps in your graphs)? It might be helpful to see one of your graphs with gaps in it. Also, can you provide maybe 10 sequential log messages with the "duplicate RRD data" in them? I'd like to get a sense of their regularity and frequency.

2021-03-03 01:24:19.002264 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614731059, different data 2021-03-03 02:55:15.002852 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614736515, different data 2021-03-04 10:01:17.004140 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614848477, different data 2021-03-05 14:15:25.007389 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614950125, different data 2021-03-05 14:15:25.007523 x/ifstat.eno16780032.rrd: Bug - duplicate RRD data with same timestamp 1614950125, different data 2021-03-05 22:56:18.014486 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614981378, different data 2021-03-05 22:56:18.015006 x/ifstat.eno16780032.rrd: Bug - duplicate RRD data with same timestamp 1614981378, different data 2021-03-06 12:30:28.002023 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1615030228, different data 2021-03-06 12:30:28.002952 x/ifstat.eno16780032.rrd: Bug - duplicate RRD data with same timestamp 1615030228, different data

...

One last thing to look at. Are the gaps actual missing data points, or are they values of zero? The way to tell this is to dump the RRD file's contents using something like "rrdtool fetch netstat.rrd AVERAGE | tail -100" (or "less rather than tail -100) and look for either zero or low numbers, >or NaN (not a number) entries. [Note that the last few are usually NaN because they're still waiting for updates, so you can ignore those.]

Currently I cant actually find a graph with a gap in it. I just noticed because it happened on the Xymon server itself. On my old setup, it never happened.

Also in xymonclient.log I get these quite alot, dunno if its related:

mv: cannot stat '/dev/shm/logfetch.x.cfg.tmp': No such file or directory cat: /dev/shm/xymon_vmstat.x: No such file or directory cat: /dev/shm/xymon_vmstat.x: No such file or directory

Regards,

Carl

jeremy＠laidman.org

10:29 p.m.

On Mon, 8 Mar 2021 at 19:21, Carl Melgaard <Carl.Melgaard at stab.rm.dk> wrote:

...

...
Are you receiving these "duplicate RRD data" messages every 5 minutes, or only occasionally (such as when you're seeing gaps in your graphs)?

...
It might be helpful to see one of your graphs with gaps in it.

...
Also, can you provide maybe 10 sequential log messages with the "duplicate RRD data" in them? I'd like to get a sense of their regularity and frequency.

2021-03-03 01:24:19.002264 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614731059, different data

2021-03-03 02:55:15.002852 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614736515, different data

2021-03-04 10:01:17.004140 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614848477, different data

2021-03-05 14:15:25.007389 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614950125, different data

2021-03-05 14:15:25.007523 x/ifstat.eno16780032.rrd: Bug - duplicate RRD data with same timestamp 1614950125, different data

2021-03-05 22:56:18.014486 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614981378, different data

2021-03-05 22:56:18.015006 x/ifstat.eno16780032.rrd: Bug - duplicate RRD data with same timestamp 1614981378, different data

2021-03-06 12:30:28.002023 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1615030228, different data

2021-03-06 12:30:28.002952 x/ifstat.eno16780032.rrd: Bug - duplicate RRD data with same timestamp 1615030228, different data

Interesting. It seems to be a rare occurrence - no more than two duplicate data points in a day - almost too few to notice. Are your gaps more than 5 minutes (one sample) long? It might be helpful for you to include an example gappy graph for us to see.

Some of these errors relate to netstat and others to ifstat processing. Both parsers receive data from the same client data message. Interestingly, only some of the errors for netstat.rrd coincide with ones for ifstat.rrd. The matching timestamps means this is unlikely to be a coincidence, but I'm not sure what to make of it TBH.

...

One last thing to look at. Are the gaps actual missing data points, or are they values of zero? The way to tell this is to dump the RRD file's contents using something like "rrdtool fetch netstat.rrd AVERAGE | tail -100" (or "less rather than tail -100) and look for either zero or low numbers, >or NaN (not a number) entries. [Note that the last few are usually NaN because they're still waiting for updates, so you can ignore those.]

Currently I cant actually find a graph with a gap in it. I just noticed because it happened on the Xymon server itself. On my old setup, it never happened.

OK. I think your best bet to diagnose is going to be correlating log messages or other events to the gaps.

You mentioned an "old setup". Can you describe what has changed from old to new setup? Have you upgraded hardware/OS/Xymon server/Xymon client(s)?

You said that you noticed on the Xymon server itself. Has it only happened to graphs for the Xymon server? I'm wondering if you have the Xymon client AND the Xymon server both running on the same host?

...

Also in xymonclient.log I get these quite alot, dunno if its related:

mv: cannot stat '/dev/shm/logfetch.x.cfg.tmp': No such file or directory

cat: /dev/shm/xymon_vmstat.x: No such file or directory

cat: /dev/shm/xymon_vmstat.x: No such file or directory

Can you explain "quite alot"? Can you give an indication of how often these occur?

This might very well be related. The logfetch and vmstat files are created during the construction of the client data message. It's likely that some, if not all, of the client data message will be missing, when these logs show up.

I'd be trying to correlate these log messages with the times that you get gaps in your graphs. If they match, then it looks to be a problem with the Xymon client.

Carl.Melgaard＠STAB.RM.DK

9 Mar 9 Mar

7:39 a.m.

On Mon, 8 Mar 2021 at 19:21, Carl Melgaard <Carl.Melgaard at stab.rm.dk<mailto:Carl.Melgaard at stab.rm.dk>> wrote:

...

One last thing to look at. Are the gaps actual missing data points, or are they values of zero? The way to tell this is to dump the RRD file's contents using something like "rrdtool fetch netstat.rrd AVERAGE | tail -100" (or "less rather than tail -100) and look for either zero or low numbers, >or NaN (not a number) entries. [Note that the last few are usually NaN because they're still waiting for updates, so you can ignore those.]

Currently I cant actually find a graph with a gap in it. I just noticed because it happened on the Xymon server itself. On my old setup, it never happened.

...

OK. I think your best bet to diagnose is going to be correlating log messages or other events to the gaps. You mentioned an "old setup". Can you describe what has changed from old to new setup? Have you upgraded hardware/OS/Xymon server/Xymon client(s)?

I changed OS, CentOS 5.11 -> RH 7.9 and Xymon from 4.3.7 to 4.3.30, and changed from selfcompiled to Terabithia-packages. So quite a big jump. Yes, client and server both runs on the same host. As did it on the old system. I want he Xymon server itself monitored. I have 2 Xymon servers, 1 primary and 1 secondary. The primary distributes to the secondary. Only the secondary is updated as of yet.

...

You said that you noticed on the Xymon server itself. Has it only happened to graphs for the Xymon server? I'm wondering if you have the Xymon client AND the Xymon server both running on the same host?

After I noticed it on the Xymon server itself, I went looked for gaps elsewhere, and I found some on other servers as well.

Also in xymonclient.log I get these quite alot, dunno if its related:

mv: cannot stat '/dev/shm/logfetch.x.cfg.tmp': No such file or directory cat: /dev/shm/xymon_vmstat.x: No such file or directory cat: /dev/shm/xymon_vmstat.x: No such file or directory

...

Can you explain "quite alot"? Can you give an indication of how often these occur?

623 lines in the logfile yesterday.

Regards,

Carl

jeremy＠laidman.org

10:53 p.m.

On Tue, 9 Mar 2021 at 18:47, Carl Melgaard <Carl.Melgaard at stab.rm.dk> wrote:

...

...
You mentioned an "old setup". Can you describe what has changed from old to new setup? Have you upgraded hardware/OS/Xymon server/Xymon client(s)?

I changed OS, CentOS 5.11 -> RH 7.9 and Xymon from 4.3.7 to 4.3.30, and changed from selfcompiled to Terabithia-packages. So quite a big jump.

Yes, client and server both runs on the same host. As did it on the old system. I want he Xymon server itself monitored.

Yep, that makes sense. My curiosity around this is the possibility that the Xymon server is running the client scripts from its clientlaunch process, and also a second copy of clientlaunch is running the same scripts - in essence, a "server" instance of the client scripts, as well as a "client" instance of the client scripts. If this is happening, you'll get two data messages every 5 minutes instead of one.

Again, I don't think this would cause graph gaps, but it might be causing some of your warning logs.

Interestingly, Terabithia packages for Xymon up to v4.3.18 included both client and server components in the one "xymon" package, as well as in the "xymon-client" package. You would only install "xymon" or "xymon-client" but not both (or you might get duplicate clients running). However, from v4.3.18, the client files in the xymon package were removed, requiring both "xymon" and "xymon-client" to be installed on the Xymon server (if you wanted to the server to monitor itself). You appear to have both packages installed on your Xymon server.

I have 2 Xymon servers, 1 primary and 1 secondary. The primary distributes

...

to the secondary. Only the secondary is updated as of yet.

It makes sense to have two for redundancy. Have you thought about configuring both Xymon servers in each client? That way, if the primary goes down, the secondary will still receive updates. (This has nothing to do with diagnosing the gaps in your graphs, I'm just curious.)

...

You said that you noticed on the Xymon server itself. Has it only happened to graphs for the Xymon server? I'm wondering if you have the Xymon client AND the Xymon server both running on the same host?

After I noticed it on the Xymon server itself, I went looked for gaps elsewhere, and I found some on other servers as well.

Right, so the gaps aren't likely to be caused by client and server running together, if it's also happening for other servers not running the Xymon server. But this might be the cause of your RRD warnings.

Also in xymonclient.log I get these quite alot, dunno if its related:

...

mv: cannot stat '/dev/shm/logfetch.x.cfg.tmp': No such file or directory

cat: /dev/shm/xymon_vmstat.x: No such file or directory

cat: /dev/shm/xymon_vmstat.x: No such file or directory

Do you only see these on the Xymon server, or these log messages also showing on Xymon clients? And if so, at what frequency?

...

...
Can you explain "quite alot"? Can you give an indication of how often these occur?

623 lines in the logfile yesterday.

That's roughly 2 every 5-minute interval. That's significant.

Your symptoms (xymonclient.log messages, RRD warnings) are consistent with two different instances of the Xymon client script running at the same time. When this happens, each instance tries to create and populate xymon_vmstat.<servername> (from a vmstat command) and include its contents in the client status message before removing the file. Usually the file only exists for a brief moment. If two instances of the client are running, it's unlikely that both would create the file, and then try to use it, at the same time. But if it did happen, the one instance would likely show the "No such file or directory" message, because the other instance had removed the file. A classic "race condition".

Similarly, the Xymon client script creates the logfetch.<servername>.cfg.tmp file, then renames it to logfetch.<servername>.cfg. If a second instance tries to rename the file after the first instance has already done so, then you'll see the "No such file or directory".

Can you show me the output of the following commands. I'm running this on one of my Xymon servers (using Terabithia RPMs) to show what you might expect:

$ pgrep -lf xymonlaunch 16602 /usr/lib/xymon/server/bin/xymonlaunch --config=/usr/lib/xymon/server/etc/tasks.cfg --env=/usr/lib/xymon/server/etc/xymonserver.cfg --log=/var/log/xymon/xymonlaunch.log --pidfile=/var/log/xymon/xymonlaunch.pid

$ pgrep -lf vmstat 8304 sh -c vmstat 300 2 1>/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252 2>&1; mv /usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252 /usr/lib/xymon/client/tmp/xymon_vmstat.<servername> 8306 vmstat 300 2

Cheers Jeremy

Carl.Melgaard＠STAB.RM.DK

10 Mar 10 Mar

8:01 a.m.

On Tue, 9 Mar 2021 at 18:47, Carl Melgaard <Carl.Melgaard at stab.rm.dk<mailto:Carl.Melgaard at stab.rm.dk>> wrote:

I have 2 Xymon servers, 1 primary and 1 secondary. The primary distributes to the secondary. Only the secondary is updated as of yet. |It makes sense to have two for redundancy. Have you thought about configuring both Xymon servers in each client? That way, if the primary goes down, the secondary will still receive updates. (This has nothing to do with diagnosing the gaps in your graphs, I'm just curious.)

Yeah, I already do this on all my xymon clients.

Also in xymonclient.log I get these quite alot, dunno if its related:

mv: cannot stat '/dev/shm/logfetch.x.cfg.tmp': No such file or directory cat: /dev/shm/xymon_vmstat.x: No such file or directory cat: /dev/shm/xymon_vmstat.x: No such file or directory

...

Do you only see these on the Xymon server, or these log messages also showing on Xymon clients? And if so, at what frequency?

I don?t see them anywhere than on the updated Xymon-server.

623 lines in the logfile yesterday.

...

That's roughly 2 every 5-minute interval. That's significant. Can you show me the output of the following commands. I'm running this on one of my Xymon servers (using Terabithia RPMs) to show what you might expect: $ pgrep -lf xymonlaunch 16602 /usr/lib/xymon/server/bin/xymonlaunch --config=/usr/lib/xymon/server/etc/tasks.cfg --env=/usr/lib/xymon/server/etc/xymonserver.cfg --log=/var/log/xymon/xymonlaunch.log --pidfile=/var/log/xymon/xymonlaunch.pid

$ ps ?ef|grep xymonlaunch xymon 1084 1 0 Jan19 ? 00:01:24 /usr/sbin/xymonlaunch --no-daemon --log=/var/log/xymon/xymonlaunch.log

...

$ pgrep -lf vmstat 8304 sh -c vmstat 300 2 1>/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252 2>&1; mv /usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252 /usr/lib/xymon/client/tmp/xymon_vmstat.<servername> 8306 vmstat 300 2

$ps ?ef |grep vmstat

xymon 14896 14893 0 08:50 ? 00:00:00 vmstat 300 2 xymon 14904 14898 0 08:50 ? 00:00:00 vmstat 300 2

I noticed these 2 running, and couldnt figure out how both were spawned. Maybe I should ?DISABLED? the client-part in clientlaunch.cfg ? I see now that theres actually a xymonclient-part in tasks.cfg? There we have the 2 instances!

I guess I?ll try that ? Thanks for pointing me right at the answer! Now I just have to figure out, why the new server is eating up 10 times more RAM than the old server, with the same amount of hosts monitored.

Regards,

Carl

jeremy＠laidman.org

10:02 a.m.

On Wed, 10 Mar 2021 at 19:02, Carl Melgaard <Carl.Melgaard at stab.rm.dk> wrote:

...

...
$ pgrep -lf vmstat

...
8304 sh -c vmstat 300 2 1>/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252 2>&1; mv /usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252 /usr/lib/xymon/client/tmp/xymon_vmstat.<servername>

...
8306 vmstat 300 2

$ps ?ef |grep vmstat

xymon 14896 14893 0 08:50 ? 00:00:00 vmstat 300 2

xymon 14904 14898 0 08:50 ? 00:00:00 vmstat 300 2

I noticed these 2 running, and couldnt figure out how both were spawned. Maybe I should ?DISABLED? the client-part in clientlaunch.cfg ? I see now that theres actually a xymonclient-part in tasks.cfg? There we have the 2 instances!

Yes, that'd be it. Disable one of those.

The clientlaunch.cfg file comments say:

Note: On the Xymon server itself, this file is normally

NOT used. Instead, both the client- and server-tasks

are controlled by the tasks.cfg file.

On a client, the clientlaunch.cfg file is loaded by /usr/lib/xymon/*client*/bin/xymonlaunch (note the "client" rather than the "server" in the path). The client instance has "--config=/usr/lib/xymon/client/etc/clientlaunch.cfg" as a parameter, to use the contents of that file. It's not usual for this instance of xymonlaunch to run on a Xymon server.

Your symptoms suggest that you have a client instance "xymonlaunch --config=...client/etc/clientlaunch.cfg" as well as the server instance. However your "ps -ef|grep xymonlaunch" only shows one. So I'm puzzled how the clientlaunch.cfg file is being processed.

I guess I?ll try that J Thanks for pointing me right at the answer! Now I

...

just have to figure out, why the new server is eating up 10 times more RAM than the old server, with the same amount of hosts monitored.

I note that you've moved quite a few OS iterations from CentOS 5 to RHEL 7. The kernel memory management is likely to be a bit different. You might find that the extra RAM usage is simply taken up by kernel buffers and cache, so isn't really "in use" in the traditional sense.

Cheers Jeremy

Carl.Melgaard＠STAB.RM.DK

12:53 p.m.

On Wed, 10 Mar 2021 at 19:02, Carl Melgaard <Carl.Melgaard at stab.rm.dk<mailto:Carl.Melgaard at stab.rm.dk>> wrote: I noticed these 2 running, and couldnt figure out how both were spawned. Maybe I should ?DISABLED? the client-part in clientlaunch.cfg ? I see now that theres actually a xymonclient-part in tasks.cfg? There we have the 2 instances!

...

Yes, that'd be it. Disable one of those. The clientlaunch.cfg file comments say:

Note: On the Xymon *server* itself, this file is normally

NOT used. Instead, both the client- and server-tasks

are controlled by the tasks.cfg file.

Yup, I disabled it in clientlaunch and it works.

...

Your symptoms suggest that you have a client instance "xymonlaunch --config=...client/etc/clientlaunch.cfg" as well as the server instance. However your "ps -ef|grep xymonlaunch" only shows one. So I'm puzzled how the clientlaunch.cfg file is being processed.

I think it runs from a service, as xymon-client is a seperate package.

...

I note that you've moved quite a few OS iterations from CentOS 5 to RHEL 7. The kernel memory management is likely to be a bit different. You might find that the extra RAM usage is simply taken up by kernel buffers and cache, so isn't really "in use" in the traditional sense.

Is there any way to verify that this is indeed the case?

Regards,

Carl

jeremy＠laidman.org

10:48 p.m.

On Wed, 10 Mar 2021 at 23:53, Carl Melgaard <Carl.Melgaard at stab.rm.dk> wrote:

...

...
Your symptoms suggest that you have a client instance "xymonlaunch --config=...client/etc/clientlaunch.cfg" as well as the server instance. However your "ps -ef|grep xymonlaunch" only shows one. So I'm puzzled how the clientlaunch.cfg file is being processed.

I think it runs from a service, as xymon-client is a seperate package.

Hmm, the Terabithia package postinstall script (from output of rpm -q --scripts xymon-client shows:

This is a hack, but we don't want to double-bounce xymonlaunch,

so let the server package handle it if both are installed...

if [ ! -e "/etc/xymon/xymonserver.cfg" ] ; then

add unit file or init script; restart if already running

if [ $1 -eq 1 ] ; then # Initial installation systemctl preset xymonlaunch.service >/dev/null 2>&1 || : fi

This tells me that the client package first checks for the server being installed (by checking the existence of /ec/xymon/xymonserver.cfg) and if not, it creates its own service, otherwise it assumes the server package will do the needful. However, this would probably not work if you installed the client package before the server package, and possibly cause both services to be installed. Either way, you should be able to rectify the situation with the appropriate "systemctl" command.

I guess I?ll try that J Thanks for pointing me right at the answer! Now I

...

just have to figure out, why the new server is eating up 10 times more RAM than the old server, with the same amount of hosts monitored.

...
I note that you've moved quite a few OS iterations from CentOS 5 to RHEL

The kernel memory management is likely to be a bit different. You might find that the extra RAM usage is simply taken up by kernel buffers and cache, so isn't really "in use" in the traditional sense.

Is there any way to verify that this is indeed the case?

Memory management and monitoring is something I don't know much about - I know just enough to know how little I know.

The "real" memory usage graph is seemingly a good indication of actual RAM utilisation because it excludes buffers/caches. If you're concerned about Xymon using too much RAM, to the point where it could affect your server's performance or stability, then I'd recommend opening a new thread to discuss it.

Cheers Jeremy

Carl.Melgaard＠STAB.RM.DK

11 Mar 11 Mar

7:43 a.m.

...

I note that you've moved quite a few OS iterations from CentOS 5 to RHEL 7. The kernel memory management is likely to be a bit different. You might find that the extra RAM usage is simply taken up by kernel buffers and cache, so isn't really "in use" in the traditional sense. Is there any way to verify that this is indeed the case? Memory management and monitoring is something I don't know much about - I know just enough to know how little I know. The "real" memory usage graph is seemingly a good indication of actual RAM utilisation because it excludes buffers/caches. If you're concerned about Xymon using too much RAM, to the point where it could affect your server's performance or stability, then I'd recommend opening a >new thread to discuss it.

It is indeed indicated in ?real? memory. I already did open a new (older) thread, which didn?t get much attention, maybe 2 replies. The mailinglist is almost dead, so I know it?s a longshot to get help.

Regards,

Carl

jeremy＠laidman.org

10:52 a.m.

On Thu, 11 Mar 2021 at 18:44, Carl Melgaard <Carl.Melgaard at stab.rm.dk> wrote:

...

...
I note that you've moved quite a few OS iterations from CentOS 5 to RHEL

The kernel memory management is likely to be a bit different. You might find that the extra RAM usage is simply taken up by kernel buffers and cache, so isn't really "in use" in the traditional sense.

Is there any way to verify that this is indeed the case?

...
Memory management and monitoring is something I don't know much about - I know just enough to know how little I know.

...
The "real" memory usage graph is seemingly a good indication of actual RAM utilisation because it excludes buffers/caches. If you're concerned about Xymon using too much RAM, to the point where it could affect your server's performance or stability, then I'd recommend opening a >new thread to discuss it.

It is indeed indicated in ?real? memory. I already did open a new (older) thread, which didn?t get much attention, maybe 2 replies. The mailinglist is almost dead, so I know it?s a longshot to get help.

The memory concern is probably a general Linux question rather than a Xymon-specific one. You might find more responses from a Linux forum.

1931

Age (days ago)

1938

Last active (days ago)

List overview

Download

15 comments

3 participants

participants (3)

Carl.Melgaard＠STAB.RM.DK
jeremy＠laidman.org
Scot.Kreienkamp＠la-z-boy.com

Gaps in graphs

Note: On the Xymon *server* itself, this file is normally

NOT used. Instead, both the client- and server-tasks

are controlled by the tasks.cfg file.

Note: On the Xymon *server* itself, this file is normally

NOT used. Instead, both the client- and server-tasks

are controlled by the tasks.cfg file.

This is a hack, but we don't want to double-bounce xymonlaunch,

so let the server package handle it if both are installed...

add unit file or init script; restart if already running

tags

participants (3)

Note: On the Xymon server itself, this file is normally

Note: On the Xymon server itself, this file is normally