localmode, got over-size message, truncating
Hi,
I've got a problem with a client running in local mode:
from /var/log/xymon/xymonclient.log
2022-03-08 06:40:22.713067 Got over-size message, truncating at 528383 bytes (max: 524288) 2022-03-08 06:40:22.725069 Dropping (more) garbled data
I already increased the following values on the xymon server:
MAXMSG_CLIENT=2048 MAXMSG_STATUS=2048
but it does not seem to have any effect on my client, because some checks (procs e.g.) still show up red respectively do not show all the data.
Is there any other value I have to adjust? Where is the limit of 524288 bytes defined on the client?
Thanks in advance!
Cheers Christoph
Christoph
There's no limit on the client side. The log "Got oversized message, truncating at ..." comes from xymond running on the Xymon server.
The limit for client messages (where your [ps] output is being truncated) is defined by MAXMSG_CLIENT, set in xymonserver.cfg, as an integer for the number of kibibytes (ie, it's multiplied by 1024). The default MAXMSG_CLIENT is 512 (meaning 524288 bytes).
You've probably set the value correctly, but something else is preventing it from being used. You can confirm that it's set correctly with something like:
$ xymoncmd --env=/etc/xymon/xymonserver.cfg env | grep MAXMSG_CLIENT MAXMSG_CLIENT=2048
If this gives the wrong value of 512, then there's something wrong with/in the file xymonserver.cfg. If this gives the correct value, your xymond probably just needs to be restarted so that it can pick up the configuration change.
On Linux you can view the environment of a running process in /proc/<pid>/env. This pseudo-file has null line terminators so running it through "strings" makes it more palatable:
$ sudo -u xymon strings /proc/pgrep -f '^xymond '/environ | grep
MAXMSG_CLIENT
MAXMSG_CLIENT=2048
If you don't get the value that's set in xymonserver.cfg, kill the xymond process and it'll restart using the current setting:
$ sudo -u xymon pkill -f '^xymond '
Cheers Jeremy
On Tue, 8 Mar 2022 at 16:48, Christoph Zechner <zechner at vrvis.at> wrote:
Hi,
I've got a problem with a client running in local mode:
from /var/log/xymon/xymonclient.log
2022-03-08 06:40:22.713067 Got over-size message, truncating at 528383 bytes (max: 524288) 2022-03-08 06:40:22.725069 Dropping (more) garbled data
I already increased the following values on the xymon server:
MAXMSG_CLIENT=2048 MAXMSG_STATUS=2048
but it does not seem to have any effect on my client, because some checks (procs e.g.) still show up red respectively do not show all the data.
Is there any other value I have to adjust? Where is the limit of 524288 bytes defined on the client?
Thanks in advance!
Cheers Christoph
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Hi Jeremy,
first of all: I solved it with your help, thanks!
On 08/03/2022 07:48, Jeremy Laidman wrote:
Christoph
There's no limit on the client side. The log "Got oversized message, truncating at ..." comes from xymond running on the Xymon server.
Thank you for confirming, I wasn't sure, if localmode changed any of that.
The limit for client messages (where your [ps] output is being truncated) is defined by MAXMSG_CLIENT, set in xymonserver.cfg, as an integer for the number of kibibytes (ie, it's multiplied by 1024). The default MAXMSG_CLIENT is 512 (meaning?524288 bytes).
You've probably set the value correctly, but something else is preventing it from being used. You can confirm?that it's set correctly with something like:
$ xymoncmd --env=/etc/xymon/xymonserver.cfg env | grep MAXMSG_CLIENT MAXMSG_CLIENT=2048
This indeed gives me the correct value of 2048.
If this gives the wrong value of 512, then there's something wrong with/in the file xymonserver.cfg. If this gives the correct value, your xymond probably just needs to be restarted so that it can pick up the configuration change.
On Linux you can view the environment of a running process in /proc/<pid>/env. This pseudo-file has null line terminators so running it through "strings" makes it more palatable:
$ sudo -u xymon strings /proc/
pgrep -f '^xymond '/environ | grep MAXMSG_CLIENT MAXMSG_CLIENT=2048
This also showed 2048.
If you don't get the value that's set in xymonserver.cfg, kill the xymond process and it'll restart using the current setting:
$ sudo -u xymon pkill -f '^xymond '
Thanks for the hint, I of course restarted the xymonserver, but some odd process did seem to survive that. After I manually killed all the remaining processes and restarted xymon, the errors in my log have stopped and everything is working as expected.
Thank you very much!
Cheers Christoph
Cheers Jeremy
On Tue, 8 Mar 2022 at 16:48, Christoph Zechner <zechner at vrvis.at <mailto:zechner at vrvis.at>> wrote:
Hi, I've got a problem with a client running in local mode: from /var/log/xymon/xymonclient.log 2022-03-08 06:40:22.713067 Got over-size message, truncating at 528383 bytes (max: 524288) 2022-03-08 06:40:22.725069 Dropping (more) garbled data I already increased the following values on the xymon server: MAXMSG_CLIENT=2048 MAXMSG_STATUS=2048 but it does not seem to have any effect on my client, because some checks (procs e.g.) still show up red respectively do not show all the data. Is there any other value I have to adjust? Where is the limit of 524288 bytes defined on the client? Thanks in advance! Cheers Christoph _______________________________________________ Xymon mailing list Xymon at xymon.com <mailto:Xymon at xymon.com> http://lists.xymon.com/mailman/listinfo/xymon <http://lists.xymon.com/mailman/listinfo/xymon>
On 08/03/2022 07:57, Christoph Zechner wrote:
Hi Jeremy,
first of all: I solved it with your help, thanks!
It seems I celebrated prematurely, the errors are back in exactly the same way :-/
2022-03-08 08:47:19.321457 Got over-size message, truncating at 528383 bytes (max: 524288) 2022-03-08 08:47:19.339786 Dropping (more) garbled data
I don't understand where this limit 05 512 comes from, everything on the server checks out (2048 before, tried 4096 as well, no change).
Cheers Christoph
On 08/03/2022 07:48, Jeremy Laidman wrote:
Christoph
There's no limit on the client side. The log "Got oversized message, truncating at ..." comes from xymond running on the Xymon server.
Thank you for confirming, I wasn't sure, if localmode changed any of that.
The limit for client messages (where your [ps] output is being truncated) is defined by MAXMSG_CLIENT, set in xymonserver.cfg, as an integer for the number of kibibytes (ie, it's multiplied by 1024). The default MAXMSG_CLIENT is 512 (meaning?524288 bytes).
You've probably set the value correctly, but something else is preventing it from being used. You can confirm?that it's set correctly with something like:
$ xymoncmd --env=/etc/xymon/xymonserver.cfg env | grep MAXMSG_CLIENT MAXMSG_CLIENT=2048
This indeed gives me the correct value of 2048.
If this gives the wrong value of 512, then there's something wrong with/in the file xymonserver.cfg. If this gives the correct value, your xymond probably just needs to be restarted so that it can pick up the configuration change.
On Linux you can view the environment of a running process in /proc/<pid>/env. This pseudo-file has null line terminators so running it through "strings" makes it more palatable:
$ sudo -u xymon strings /proc/
pgrep -f '^xymond '/environ | grep MAXMSG_CLIENT MAXMSG_CLIENT=2048This also showed 2048.
If you don't get the value that's set in xymonserver.cfg, kill the xymond process and it'll restart using the current setting:
$ sudo -u xymon pkill -f '^xymond '
Thanks for the hint, I of course restarted the xymonserver, but some odd process did seem to survive that. After I manually killed all the remaining processes and restarted xymon, the errors in my log have stopped and everything is working as expected.
Thank you very much!
Cheers Christoph
Cheers Jeremy
On Tue, 8 Mar 2022 at 16:48, Christoph Zechner <zechner at vrvis.at <mailto:zechner at vrvis.at>> wrote:
??? Hi,
??? I've got a problem with a client running in local mode:
??? from /var/log/xymon/xymonclient.log
??? 2022-03-08 06:40:22.713067 Got over-size message, truncating at 528383 ??? bytes (max: 524288) ??? 2022-03-08 06:40:22.725069 Dropping (more) garbled data
??? I already increased the following values on the xymon server:
??? MAXMSG_CLIENT=2048 ??? MAXMSG_STATUS=2048
??? but it does not seem to have any effect on my client, because some ??? checks (procs e.g.) still show up red respectively do not show all ??? the data.
??? Is there any other value I have to adjust? Where is the limit of 524288 ??? bytes defined on the client?
??? Thanks in advance!
??? Cheers ??? Christoph ??? _______________________________________________ ??? Xymon mailing list ??? Xymon at xymon.com <mailto:Xymon at xymon.com> ??? http://lists.xymon.com/mailman/listinfo/xymon ??? <http://lists.xymon.com/mailman/listinfo/xymon>
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
On Tue, 8 Mar 2022 at 18:52, Christoph Zechner <zechner at vrvis.at> wrote:
It seems I celebrated prematurely, the errors are back in exactly the same way :-/
2022-03-08 08:47:19.321457 Got over-size message, truncating at 528383 bytes (max: 524288) 2022-03-08 08:47:19.339786 Dropping (more) garbled data
I don't understand where this limit 05 512 comes from, everything on the server checks out (2048 before, tried 4096 as well, no change).
I'm at a loss. If the xymond process is proven to have this set at 2048, then I see no reason why it would give that error message with that number.
Unless it's referring to another message type and hence a different maximum setting? Perhaps take a look at xymond's environment again, but search for all MAXMSG_ variables. See which one is set to 512, and that might be the culprit. The defaults for these max values are all different, with only two of them defaulting to 512: MAXMSG_CLIENT, MAXMSG_CLICHG (reference: lib/xymond_buffer.c). But it's possible one of them has been set to 512.
The only other thing I can think of is that you have two copies of xymond running, somehow with different values of MAXMSG_CLIENT. But I can't think how this could come about. And you've already killed off any rogue processes.
Maybe run xymond in debug mode for one round of updates, until you get the "Got over-size message" and review the debug logs. This might provide enough additional detail to find out what's going on.
Another approach to solve the problem (truncated client data message) is to modify the client script (eg xymonclient-linux.sh) to truncate the ps command output, so that the total message size is less, and hopefully fits within the max message size. This will mean that PROC checks might not work anymore (which is likely the case now). But the current state is that monitoring of the sections that come after [ps] are likely broken now. On Linux this is notably the [top] and [vmstat] sections of the client data message, that are used for the "cpu" status and several metrics for graphing. Maybe something like adding "head -1000" will cut it down to a reasonable size:
echo "[ps]" ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd | head -1000
Also, review the client data message before the [ps] section to see if there's actually something else pushing it over the limit, and [ps] just happens to be where the truncation happens.
J
On 09/03/2022 00:04, Jeremy Laidman wrote:
On Tue, 8 Mar 2022 at 18:52, Christoph Zechner <zechner at vrvis.at <mailto:zechner at vrvis.at>> wrote:
It seems I celebrated prematurely, the errors are back in exactly the same way :-/ 2022-03-08 08:47:19.321457 Got over-size message, truncating at 528383 bytes (max: 524288) 2022-03-08 08:47:19.339786 Dropping (more) garbled data I don't understand where this limit 05 512 comes from, everything on the server checks out (2048 before, tried 4096 as well, no change).I'm at a loss. If the xymond process is proven to have this set at 2048, then I see no reason why it would give that error message with that number.
Unless it's referring to another message type and hence a different maximum setting? Perhaps take a look at xymond's environment again, but search for all MAXMSG_ variables. See which one is set to 512, and that might be the culprit. The defaults for these max values are all different, with only two of them defaulting to 512: MAXMSG_CLIENT, MAXMSG_CLICHG (reference: lib/xymond_buffer.c). But it's possible one of them has been set to 512.
Thanks, I tried that, but unfortunately, this did not help, since all the values were set correctly, according to my config.
The only other thing I can think of is that you have two copies of xymond running, somehow with different values of MAXMSG_CLIENT. But I can't think how this could come about. And you've already killed off any rogue processes.
Right, that's not it either. :-/
Maybe run xymond in debug mode for one round of updates, until you get the "Got over-size message" and review the debug logs. This might provide enough additional detail to find out what's going on.
Another approach to solve the problem (truncated client data message) is to modify the client script (eg xymonclient-linux.sh) to truncate the ps command output, so that the total message size is less, and hopefully fits within the max message size. This will mean that PROC checks might not work anymore (which is likely the case now). But the current state is that monitoring of the sections that come after [ps] are likely broken now. On Linux this is notably the [top] and [vmstat] sections of the client data message, that are used for the "cpu" status and several metrics for graphing. Maybe something like adding "head -1000" will cut it down to a reasonable size:
echo "[ps]" ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd | head -1000
That's actually a gread idea and I modified the [ports] section, because I know this is the culprit (running a proxy there and all the active client connections were too much for xymon to handle.
I'm not interested in client connections anyway, I just want to monitor my running programs and ports on that server, so I replaced the original
netstat -antuW 2>/dev/null netstat -antuT 2>/dev/null
with
netstat -tulpenW 2>/dev/null
(adding your "| head 1000" suggestion did not work, because it cut off the list before it could reach the IPv6 interfaces and thus the ports check was always red).
Now xymon works again, although this is just a workaround, because the underlying problem of where exactly my messages got truncated, is still to be found, but I can live with this solution.
Anyway, I very much appreciate your time and efforts, thank you very much!
Cheers Christoph
Also, review the client data message before the [ps] section to see if there's actually something else pushing it over the limit, and [ps] just happens to be where the truncation happens.
J
I solved it!
I had to add and set "MAXMSG_CLIENT=1024" in /etc/xymon/xymonclient.cfg, restarted xymon-client and all the errors were gone.
Thanks again for your help!
Cheers Christoph
On 09/03/2022 06:42, Christoph Zechner wrote:
On 09/03/2022 00:04, Jeremy Laidman wrote:
On Tue, 8 Mar 2022 at 18:52, Christoph Zechner <zechner at vrvis.at <mailto:zechner at vrvis.at>> wrote:
??? It seems I celebrated prematurely, the errors are back in exactly the ??? same way :-/
??? 2022-03-08 08:47:19.321457 Got over-size message, truncating at 528383 ??? bytes (max: 524288) ??? 2022-03-08 08:47:19.339786 Dropping (more) garbled data
??? I don't understand where this limit 05 512 comes from, everything on ??? the ??? server checks out (2048 before, tried 4096 as well, no change).
I'm at a loss. If the xymond process is proven to have this set at 2048, then I see no reason why it would give that error message with that number.
Unless it's referring to another message type and hence a different maximum setting? Perhaps take a look at xymond's environment again, but search for all MAXMSG_ variables. See which one is set to 512, and that might be the culprit. The defaults for these max values are all different, with only two of them defaulting to 512: MAXMSG_CLIENT, MAXMSG_CLICHG (reference: lib/xymond_buffer.c). But it's possible one of them has been set to 512.
Thanks, I tried that, but unfortunately, this did not help, since all the values were set correctly, according to my config.
The only other thing I can think of is that you have two copies of xymond running, somehow with different values of MAXMSG_CLIENT. But I can't think how this could come about. And you've already killed off any rogue processes.
Right, that's not it either. :-/
Maybe run xymond in debug mode for one round of updates, until you get the "Got over-size message" and review the debug logs. This might provide enough additional detail to find out what's going on.
Another approach to solve the problem (truncated client data message) is to modify the client script (eg xymonclient-linux.sh) to truncate the ps command output, so that the total message size is less, and hopefully fits within the max message size. This will mean that PROC checks might not work anymore (which is likely the case now). But the current state is that monitoring of the sections that come after [ps] are likely broken now. On Linux this is notably the [top] and [vmstat] sections of the client data message, that are used for the "cpu" status and several metrics for graphing. Maybe something like adding "head -1000" will cut it down to a reasonable size:
echo "[ps]" ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd | head -1000
That's actually a gread idea and I modified the [ports] section, because I know this is the culprit (running a proxy there and all the active client connections were too much for xymon to handle.
I'm not interested in client connections anyway, I just want to monitor my running programs and ports on that server, so I replaced the original
netstat -antuW 2>/dev/null netstat -antuT 2>/dev/null
with
netstat -tulpenW 2>/dev/null
(adding your "| head 1000" suggestion did not work, because it cut off the list before it could reach the IPv6 interfaces and thus the ports check was always red).
Now xymon works again, although this is just a workaround, because the underlying problem of where exactly my messages got truncated, is still to be found, but I can live with this solution.
Anyway, I very much appreciate your time and efforts, thank you very much!
Cheers Christoph
Also, review the client data message before the [ps] section to see if there's actually something else pushing it over the limit, and [ps] just happens to be where the truncation happens.
J
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Great work Christoph.
Sorry, it appears that I led you down the wrong path, asserting that it was a server-only setting in xymond. It would appear to be a client-side setting. This seems to be undocumented in the man page for xymonclient.cfg.
J
On Thu, 10 Mar 2022 at 21:18, Christoph Zechner <zechner at vrvis.at> wrote:
I solved it!
I had to add and set "MAXMSG_CLIENT=1024" in /etc/xymon/xymonclient.cfg, restarted xymon-client and all the errors were gone.
Thanks again for your help!
Cheers Christoph
On 09/03/2022 06:42, Christoph Zechner wrote:
On 09/03/2022 00:04, Jeremy Laidman wrote:
On Tue, 8 Mar 2022 at 18:52, Christoph Zechner <zechner at vrvis.at <mailto:zechner at vrvis.at>> wrote:
It seems I celebrated prematurely, the errors are back in exactlythe same way :-/
2022-03-08 08:47:19.321457 Got over-size message, truncating at528383 bytes (max: 524288) 2022-03-08 08:47:19.339786 Dropping (more) garbled data
I don't understand where this limit 05 512 comes from, everything on the server checks out (2048 before, tried 4096 as well, no change).I'm at a loss. If the xymond process is proven to have this set at 2048, then I see no reason why it would give that error message with that number.
Unless it's referring to another message type and hence a different maximum setting? Perhaps take a look at xymond's environment again, but search for all MAXMSG_ variables. See which one is set to 512, and that might be the culprit. The defaults for these max values are all different, with only two of them defaulting to 512: MAXMSG_CLIENT, MAXMSG_CLICHG (reference: lib/xymond_buffer.c). But it's possible one of them has been set to 512.
Thanks, I tried that, but unfortunately, this did not help, since all the values were set correctly, according to my config.
The only other thing I can think of is that you have two copies of xymond running, somehow with different values of MAXMSG_CLIENT. But I can't think how this could come about. And you've already killed off any rogue processes.
Right, that's not it either. :-/
Maybe run xymond in debug mode for one round of updates, until you get the "Got over-size message" and review the debug logs. This might provide enough additional detail to find out what's going on.
Another approach to solve the problem (truncated client data message) is to modify the client script (eg xymonclient-linux.sh) to truncate the ps command output, so that the total message size is less, and hopefully fits within the max message size. This will mean that PROC checks might not work anymore (which is likely the case now). But the current state is that monitoring of the sections that come after [ps] are likely broken now. On Linux this is notably the [top] and [vmstat] sections of the client data message, that are used for the "cpu" status and several metrics for graphing. Maybe something like adding "head -1000" will cut it down to a reasonable size:
echo "[ps]" ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd | head -1000
That's actually a gread idea and I modified the [ports] section, because I know this is the culprit (running a proxy there and all the active client connections were too much for xymon to handle.
I'm not interested in client connections anyway, I just want to monitor my running programs and ports on that server, so I replaced the original
netstat -antuW 2>/dev/null netstat -antuT 2>/dev/null
with
netstat -tulpenW 2>/dev/null
(adding your "| head 1000" suggestion did not work, because it cut off the list before it could reach the IPv6 interfaces and thus the ports check was always red).
Now xymon works again, although this is just a workaround, because the underlying problem of where exactly my messages got truncated, is still to be found, but I can live with this solution.
Anyway, I very much appreciate your time and efforts, thank you very much!
Cheers Christoph
Also, review the client data message before the [ps] section to see if there's actually something else pushing it over the limit, and [ps] just happens to be where the truncation happens.
J
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Honestly, I can't work out how this happened. A review of the code - in as much as I can understand it, not being a C programmer - shows that there's only one place the MAXMSG_CLIENT parameter is used, and that's in xymond. In particular, it's not used in the xymon client (which is the only process that logs to xymonclient.log).
I can understand how it could have come about that xymond was loaded using xymonclient.cfg for its environment, thus applying the smaller size limit to incoming messages. But if this were the case, I can't work out how you would have seen MAXMSG_CLIENT=2048 in the running xymond process's environment.
So, I'm glad you worked out a solution. But I don't think we quite understand the cause.
On Thu, 10 Mar 2022 at 22:41, Jeremy Laidman <jeremy at laidman.org> wrote:
Great work Christoph.
Sorry, it appears that I led you down the wrong path, asserting that it was a server-only setting in xymond. It would appear to be a client-side setting. This seems to be undocumented in the man page for xymonclient.cfg.
J
On Thu, 10 Mar 2022 at 21:18, Christoph Zechner <zechner at vrvis.at> wrote:
I solved it!
I had to add and set "MAXMSG_CLIENT=1024" in /etc/xymon/xymonclient.cfg, restarted xymon-client and all the errors were gone.
Thanks again for your help!
Cheers Christoph
On 09/03/2022 06:42, Christoph Zechner wrote:
On 09/03/2022 00:04, Jeremy Laidman wrote:
On Tue, 8 Mar 2022 at 18:52, Christoph Zechner <zechner at vrvis.at <mailto:zechner at vrvis.at>> wrote:
It seems I celebrated prematurely, the errors are back in exactlythe same way :-/
2022-03-08 08:47:19.321457 Got over-size message, truncating at528383 bytes (max: 524288) 2022-03-08 08:47:19.339786 Dropping (more) garbled data
I don't understand where this limit 05 512 comes from, everythingon the server checks out (2048 before, tried 4096 as well, no change).
I'm at a loss. If the xymond process is proven to have this set at 2048, then I see no reason why it would give that error message with that number.
Unless it's referring to another message type and hence a different maximum setting? Perhaps take a look at xymond's environment again, but search for all MAXMSG_ variables. See which one is set to 512, and that might be the culprit. The defaults for these max values are all different, with only two of them defaulting to 512: MAXMSG_CLIENT, MAXMSG_CLICHG (reference: lib/xymond_buffer.c). But it's possible one of them has been set to 512.
Thanks, I tried that, but unfortunately, this did not help, since all the values were set correctly, according to my config.
The only other thing I can think of is that you have two copies of xymond running, somehow with different values of MAXMSG_CLIENT. But I can't think how this could come about. And you've already killed off any rogue processes.
Right, that's not it either. :-/
Maybe run xymond in debug mode for one round of updates, until you get the "Got over-size message" and review the debug logs. This might provide enough additional detail to find out what's going on.
Another approach to solve the problem (truncated client data message) is to modify the client script (eg xymonclient-linux.sh) to truncate the ps command output, so that the total message size is less, and hopefully fits within the max message size. This will mean that PROC checks might not work anymore (which is likely the case now). But the current state is that monitoring of the sections that come after [ps] are likely broken now. On Linux this is notably the [top] and [vmstat] sections of the client data message, that are used for the "cpu" status and several metrics for graphing. Maybe something like adding "head -1000" will cut it down to a reasonable size:
echo "[ps]" ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd | head -1000
That's actually a gread idea and I modified the [ports] section, because I know this is the culprit (running a proxy there and all the active client connections were too much for xymon to handle.
I'm not interested in client connections anyway, I just want to monitor my running programs and ports on that server, so I replaced the original
netstat -antuW 2>/dev/null netstat -antuT 2>/dev/null
with
netstat -tulpenW 2>/dev/null
(adding your "| head 1000" suggestion did not work, because it cut off the list before it could reach the IPv6 interfaces and thus the ports check was always red).
Now xymon works again, although this is just a workaround, because the underlying problem of where exactly my messages got truncated, is still to be found, but I can live with this solution.
Anyway, I very much appreciate your time and efforts, thank you very much!
Cheers Christoph
Also, review the client data message before the [ps] section to see if there's actually something else pushing it over the limit, and [ps] just happens to be where the truncation happens.
J
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
On 10/03/2022 23:56, Jeremy Laidman wrote:
Honestly, I can't work out how this happened. A review of the code - in as much as I can understand it, not being a C programmer - shows that there's only one place the MAXMSG_CLIENT parameter is used, and that's in xymond. In particular, it's not used in the xymon client (which is the only process that logs to xymonclient.log).
I also digged through the source code trying to find answers and since I'm using local mode on my clients (thus utilising the xymond_client binary), I think it makes sense (more or less).
I can understand how it could have come about that xymond was loaded using xymonclient.cfg for its environment, thus applying the smaller size limit to incoming messages. But if this were the case, I can't work out how you would have seen MAXMSG_CLIENT=2048 in the running xymond process's environment.
My MAXMSG_CLIENT=2048 messages were always server-side (thanks to your env command line showing me the current used options), I never even saw that variable on my client, because it never got set. Only after I manually added it to xymonclient.cfg, it started working as expected.
I think it classifies as a bug, but xymon's localmode is somewhat undocumented (the binary for it is missing in the Debian package as well, for example...) and in my opinion this should be documented somewhere.
Christoph
So, I'm glad you worked out a solution. But I don't think we quite understand the cause.
On Thu, 10 Mar 2022 at 22:41, Jeremy Laidman <jeremy at laidman.org <mailto:jeremy at laidman.org>> wrote:
Great work Christoph. Sorry, it appears that I led you down the wrong path,?asserting that it was a server-only?setting in xymond. It would appear?to be a client-side setting. This seems to be undocumented in the man page for xymonclient.cfg. J On Thu, 10 Mar 2022 at 21:18, Christoph Zechner <zechner at vrvis.at <mailto:zechner at vrvis.at>> wrote: I solved it! I had to add and set "MAXMSG_CLIENT=1024" in /etc/xymon/xymonclient.cfg, restarted xymon-client and all the errors were gone. Thanks again for your help! Cheers Christoph On 09/03/2022 06:42, Christoph Zechner wrote: > On 09/03/2022 00:04, Jeremy Laidman wrote: >> On Tue, 8 Mar 2022 at 18:52, Christoph Zechner <zechner at vrvis.at <mailto:zechner at vrvis.at> >> <mailto:zechner at vrvis.at <mailto:zechner at vrvis.at>>> wrote: >> >> ??? It seems I celebrated prematurely, the errors are back in exactly the >> ??? same way :-/ >> >> ??? 2022-03-08 08:47:19.321457 Got over-size message, truncating at >> 528383 >> ??? bytes (max: 524288) >> ??? 2022-03-08 08:47:19.339786 Dropping (more) garbled data >> >> ??? I don't understand where this limit 05 512 comes from, everything on >> ??? the >> ??? server checks out (2048 before, tried 4096 as well, no change). >> >> >> I'm at a loss. If the xymond process is proven to have this set at >> 2048, then I see no reason why it would give that error message with >> that number. >> >> Unless it's referring to another message type and hence a different >> maximum setting? Perhaps take a look at xymond's environment again, >> but search for all MAXMSG_ variables. See which one is set to 512, and >> that might be the culprit. The defaults for these max values are all >> different, with only two of them defaulting to 512: MAXMSG_CLIENT, >> MAXMSG_CLICHG (reference: lib/xymond_buffer.c). But it's possible one >> of them has been set to 512. > > Thanks, I tried that, but unfortunately, this did not help, since all > the values were set correctly, according to my config. > >> >> The only other thing I can think of is that you have two copies of >> xymond running, somehow with different values of MAXMSG_CLIENT. But I >> can't think how this could come about. And you've already killed off >> any rogue processes. > > Right, that's not it either. :-/ > >> >> Maybe run xymond in debug mode for one round of updates, until you get >> the "Got over-size message" and review the debug logs. This might >> provide enough additional detail to find out what's going on. >> >> Another approach to solve the problem (truncated client data message) >> is to modify the client script (eg xymonclient-linux.sh) to truncate >> the ps command output, so that the total message size is less, and >> hopefully fits within the max message size. This will mean that PROC >> checks might not work anymore (which is likely the case now). But the >> current state is that monitoring of the sections that come after [ps] >> are likely broken now. On Linux this is notably the [top] and [vmstat] >> sections of the client data message, that are used for the "cpu" >> status and several metrics for graphing. Maybe something like adding >> "head -1000" will cut it down to a reasonable size: >> >> echo "[ps]" >> ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd | >> head -1000 > > That's actually a gread idea and I modified the [ports] section, because > I know this is the culprit (running a proxy there and all the active > client connections were too much for xymon to handle. > > I'm not interested in client connections anyway, I just want to monitor > my running programs and ports on that server, so I replaced the original > > netstat -antuW 2>/dev/null > netstat -antuT 2>/dev/null > > with > > netstat -tulpenW 2>/dev/null > > (adding your "| head 1000" suggestion did not work, because it cut off > the list before it could reach the IPv6 interfaces and thus the ports > check was always red). > > Now xymon works again, although this is just a workaround, because the > underlying problem of where exactly my messages got truncated, is still > to be found, but I can live with this solution. > > Anyway, I very much appreciate your time and efforts, thank you very much! > > Cheers > Christoph > >> >> Also, review the client data message before the [ps] section to see if >> there's actually something else pushing it over the limit, and [ps] >> just happens to be where the truncation happens. >> >> J >> > _______________________________________________ > Xymon mailing list > Xymon at xymon.com <mailto:Xymon at xymon.com> > http://lists.xymon.com/mailman/listinfo/xymon <http://lists.xymon.com/mailman/listinfo/xymon>
On 10/03/2022 12:41, Jeremy Laidman wrote:
Great work Christoph.
Sorry, it appears that I led you down the wrong path,?asserting that it was a server-only?setting in xymond. It would appear?to be a client-side setting. This seems to be undocumented in the man page for xymonclient.cfg.
Please, no worries, you steered me into the right direction and increasing the message sizes on the server was not a bad idea anyhow. :-)
But yes, this is undocumented unfortunately. I already filed a bug report with the Debian maintainers, let's see what comes of it.
Christoph
J
On Thu, 10 Mar 2022 at 21:18, Christoph Zechner <zechner at vrvis.at <mailto:zechner at vrvis.at>> wrote:
I solved it! I had to add and set "MAXMSG_CLIENT=1024" in /etc/xymon/xymonclient.cfg, restarted xymon-client and all the errors were gone. Thanks again for your help! Cheers Christoph On 09/03/2022 06:42, Christoph Zechner wrote: > On 09/03/2022 00:04, Jeremy Laidman wrote: >> On Tue, 8 Mar 2022 at 18:52, Christoph Zechner <zechner at vrvis.at <mailto:zechner at vrvis.at> >> <mailto:zechner at vrvis.at <mailto:zechner at vrvis.at>>> wrote: >> >> ??? It seems I celebrated prematurely, the errors are back in exactly the >> ??? same way :-/ >> >> ??? 2022-03-08 08:47:19.321457 Got over-size message, truncating at >> 528383 >> ??? bytes (max: 524288) >> ??? 2022-03-08 08:47:19.339786 Dropping (more) garbled data >> >> ??? I don't understand where this limit 05 512 comes from, everything on >> ??? the >> ??? server checks out (2048 before, tried 4096 as well, no change). >> >> >> I'm at a loss. If the xymond process is proven to have this set at >> 2048, then I see no reason why it would give that error message with >> that number. >> >> Unless it's referring to another message type and hence a different >> maximum setting? Perhaps take a look at xymond's environment again, >> but search for all MAXMSG_ variables. See which one is set to 512, and >> that might be the culprit. The defaults for these max values are all >> different, with only two of them defaulting to 512: MAXMSG_CLIENT, >> MAXMSG_CLICHG (reference: lib/xymond_buffer.c). But it's possible one >> of them has been set to 512. > > Thanks, I tried that, but unfortunately, this did not help, since all > the values were set correctly, according to my config. > >> >> The only other thing I can think of is that you have two copies of >> xymond running, somehow with different values of MAXMSG_CLIENT. But I >> can't think how this could come about. And you've already killed off >> any rogue processes. > > Right, that's not it either. :-/ > >> >> Maybe run xymond in debug mode for one round of updates, until you get >> the "Got over-size message" and review the debug logs. This might >> provide enough additional detail to find out what's going on. >> >> Another approach to solve the problem (truncated client data message) >> is to modify the client script (eg xymonclient-linux.sh) to truncate >> the ps command output, so that the total message size is less, and >> hopefully fits within the max message size. This will mean that PROC >> checks might not work anymore (which is likely the case now). But the >> current state is that monitoring of the sections that come after [ps] >> are likely broken now. On Linux this is notably the [top] and [vmstat] >> sections of the client data message, that are used for the "cpu" >> status and several metrics for graphing. Maybe something like adding >> "head -1000" will cut it down to a reasonable size: >> >> echo "[ps]" >> ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd | >> head -1000 > > That's actually a gread idea and I modified the [ports] section, because > I know this is the culprit (running a proxy there and all the active > client connections were too much for xymon to handle. > > I'm not interested in client connections anyway, I just want to monitor > my running programs and ports on that server, so I replaced the original > > netstat -antuW 2>/dev/null > netstat -antuT 2>/dev/null > > with > > netstat -tulpenW 2>/dev/null > > (adding your "| head 1000" suggestion did not work, because it cut off > the list before it could reach the IPv6 interfaces and thus the ports > check was always red). > > Now xymon works again, although this is just a workaround, because the > underlying problem of where exactly my messages got truncated, is still > to be found, but I can live with this solution. > > Anyway, I very much appreciate your time and efforts, thank you very much! > > Cheers > Christoph > >> >> Also, review the client data message before the [ps] section to see if >> there's actually something else pushing it over the limit, and [ps] >> just happens to be where the truncation happens. >> >> J >> > _______________________________________________ > Xymon mailing list > Xymon at xymon.com <mailto:Xymon at xymon.com> > http://lists.xymon.com/mailman/listinfo/xymon <http://lists.xymon.com/mailman/listinfo/xymon>
participants (2)
-
jeremy@laidman.org
-
zechner@vrvis.at