I've been adding a lot of hosts to our system using the XymonPSclient which sends it's data over port 443 to xymoncgimsg.cgi I've got 1 hosts who's payload is over a Meg and the logs seem to indicate that it's timing out before the payload completes it's send. The client logs say this. body length 1075939, timeout 100000ms
I know 100000 ms is nearly 2 minutes, but I've already got the Xymon server set to accept very large payloads and have never had this problem before. And it's only this 1 host that has a large payload. Where would I look to expand the timeout limit? Is this a PS client script issue, an Apache server issue, or a xymoncgimsg.cgi config issue?
-- Kris Springer
Kris
I'm not a WinPSClient user, but it appears that the timeout you're seeing is set on the client, and can be adjusted with the serverHttpTimeoutMs setting in xymonclient_config.xml.
Some more info about the client timeout is in this thread: https://lists.xymon.com/xymon/2018-November/045850.html
J
On Thu, 25 Jul 2024 at 06:03, Kris Springer <kspringer@innovateteam.com> wrote:
I've been adding a lot of hosts to our system using the XymonPSclient which sends it's data over port 443 to xymoncgimsg.cgi I've got 1 hosts who's payload is over a Meg and the logs seem to indicate that it's timing out before the payload completes it's send. The client logs say this. body length 1075939, timeout 100000ms
I know 100000 ms is nearly 2 minutes, but I've already got the Xymon server set to accept very large payloads and have never had this problem before. And it's only this 1 host that has a large payload. Where would I look to expand the timeout limit? Is this a PS client script issue, an Apache server issue, or a xymoncgimsg.cgi config issue?
-- Kris Springer
Xymon mailing list -- xymon@xymon.com To unsubscribe send an email to xymon-leave@xymon.com
Thanks for that! Funny that it was my post you referenced from 2018 too! But, that didn't resolve my issue. I'm thinking that there's some goofy characters in the payload that the server doesn't like. Here's some redacted logs from the client log.
Connecting to https://redacted/xymon-cgi/xymoncgimsg.cgi, body length 1075939, timeout 100000ms Exception connecting to https://redacted/xymon-cgi/xymoncgimsg.cgi: Exception calling "GetResponse" with "0" argument(s): "The remote server returned an error: (500) Internal Server Error."
It's only this 1 host that's having an issue. There's a second host from the same network that's working fine. Same OS version even. The [procs] content in the lastcollect.txt is a huge mass of pathnames. Could it simply be too much and the server is rejecting it?
Kris Springer
On 7/24/24 18:31, Jeremy Laidman wrote:
Kris
I'm not a WinPSClient user, but it appears that the timeout you're seeing is set on the client, and can be adjusted with the serverHttpTimeoutMs setting in xymonclient_config.xml.
Some more info about the client timeout is in this thread: https://lists.xymon.com/xymon/2018-November/045850.html
J
On Thu, 25 Jul 2024 at 06:03, Kris Springer <kspringer@innovateteam.com> wrote:
I've been adding a lot of hosts to our system using the XymonPSclient which sends it's data over port 443 to xymoncgimsg.cgi I've got 1 hosts who's payload is over a Meg and the logs seem to indicate that it's timing out before the payload completes it's send. The client logs say this. body length 1075939, timeout 100000ms I know 100000 ms is nearly 2 minutes, but I've already got the Xymon server set to accept very large payloads and have never had this problem before. And it's only this 1 host that has a large payload. Where would I look to expand the timeout limit? Is this a PS client script issue, an Apache server issue, or a xymoncgimsg.cgi config issue? -- Kris Springer _______________________________________________ Xymon mailing list -- xymon@xymon.com To unsubscribe send an email to xymon-leave@xymon.com
Xymon mailing list --xymon@xymon.com To unsubscribe send an email toxymon-leave@xymon.com
Hi Kris
I'm not sure that a large message is the cause, but I wouldn't rule it out. The thing is, the client is talking to your webserver (Apache? I can't remember if you said what webserver you're running) and is not talking directly to Xymon[*], and we know that webservers are capable of accepting large uploads in the gigabytes, whereas your logs indicate a size around one megabyte (assuming the PS client "$body.Length" is in bytes). So my guess is that something else is going on other than this being simply a size issue. But to rule this out, I would try adjusting the client so that the [procs] content is small/empty, and see if you still have the problem.
Could it be a timeout issue caused by the size of message? Possibly, but I'd think this alone wouldn't be a problem. But perhaps some extreme rate limiting could extend the transfer time enough to cause a timeout? I would think this unlikely.
Have you checked your webserver logs to see if it's showing any problems? Also check the cgidebug.log file, written by all CGI
It occurs to me that the "timeout 100000ms" message could be ambiguous. To some, it might indicate that there was a timeout event. But I suspect it means simply, "I'm about to try sending the body that has length L and I have set my timeout to T" right before it connects to the remote end.
[*]: While the client talks directly to the webserver, that doesn't rule out the possibility that the CGI running in the webserver connects to Xymon at the start of the session, and it is indeed xymond that is telling xymoncgimsg that there's a problem. I haven't looked into how xymoncgimsg works, so it might be capturing the entire message before connecting to 127.0.0.0:1984 to deliver it, or it could be relaying blocks to 127.0:1984 as they're coming in from the client. If it's the former, then xymond isn't in play when the transfer is complete. I had a quick look at the source code for xymoncgimsg, and although it's only half a dozen lines, it's not entirely clear to me what it's doing (mostly because I'm not a programmer, but also because some of the detail are within functions elsewhere in the source). Nevertheless, it appears to be getting the entire message first, then sending it to xymond, then possibly sending any text from xymond back to the client (sections from clientlocal.cfg). Unfortunately, there is no --debug option for xymoncgimsg, so debugging would be limited to looking at webserver logs and (probably) xymond.log.
Interestingly, xymoncgimsg uses the XYMON_TIMEOUT constant, which is defined as 15 (seconds, probably), when sending data to xymond. This puts an upper limit on how long the CGI will wait for the message from the client. But it does mean that if there's a slow transfer, it's the CGI that will terminate the session after 15 seconds rather than the client at 100 seconds. If server-side logs (web, xymond) show the start and end of this server-side message passing, then you would be able to tell if the 15 second limit is being reached. If logs don't confirm this, you might be able to create a CGI wrapper that logs start and end timestamps, while calling the real CGI binary betwixt these. Something like (untested):
#!/bin/sh
xymoncgimsg-wrapper.cgi
echo "date start" >> /tmp/msg.log
dirname $0/xymoncgimsg.cgi
echo "date end" >> /tmp/msg.log
You would put the wrapper in the same dir as xymoncgimsg.cgi, and adjust the client URL to call this alternate version.
I think that all of Xymon's CGI files in the cgi-bin directory are actually all the same wrapper binary, which then calls the real binary located elsewhere. That binary can be run in a debug mode, and log to the file cgidebug.log (in the same dir as other logfiles such as xymond.log). Reading the source if cgiwrap, it appears that debug mode can be enabled by setting CGIDEBUG in cgioptions.cfg. While the xymoncgimsg source doesn't invoke any debugging itself, the wrapper debugging might provide sufficient meta-data to provide some insight.
A few other things you could try:
- see if an almost empty client message causes the same problem, perhaps increasing the message size until failure
- try sending a captured message from a command line on the Windows server (eg using curl for Windows) as a post message, and see if you get the same results, possibly trying different content or parameters, until you find what triggers the fault
- use an http URL instead of an https URL, and then do a packet capture on port 80 to view the message content, conversation and timing
- do a packet capture on port 443 just to see how long the TCP session actually takes to complete
- recompile xymoncgimsg.cgi after replacing XYMON_TIMEOUT with 30 or 300, to see if it makes a difference.
- include strace in the wrapper (eg: strace
dirname $0/xymoncgimsg.cgi 2>>/tmp/msg.log) and see what the CGI is actually doing if the connection is severed prematurely
I hope some of this helps you get it sorted out. It's a weird problem for sure.
Cheers Jeremy
On Fri, 26 Jul 2024 at 02:10, Kris Springer <kspringer@innovateteam.com> wrote:
Thanks for that! Funny that it was my post you referenced from 2018 too! But, that didn't resolve my issue. I'm thinking that there's some goofy characters in the payload that the server doesn't like. Here's some redacted logs from the client log.
Connecting to https://redacted/xymon-cgi/xymoncgimsg.cgi, body length 1075939, timeout 100000ms Exception connecting to https://redacted/xymon-cgi/xymoncgimsg.cgi: Exception calling "GetResponse" with "0" argument(s): "The remote server returned an error: (500) Internal Server Error."
It's only this 1 host that's having an issue. There's a second host from the same network that's working fine. Same OS version even. The [procs] content in the lastcollect.txt is a huge mass of pathnames. Could it simply be too much and the server is rejecting it?
Kris Springer
On 7/24/24 18:31, Jeremy Laidman wrote:
Kris
I'm not a WinPSClient user, but it appears that the timeout you're seeing is set on the client, and can be adjusted with the serverHttpTimeoutMs setting in xymonclient_config.xml.
Some more info about the client timeout is in this thread: https://lists.xymon.com/xymon/2018-November/045850.html
J
On Thu, 25 Jul 2024 at 06:03, Kris Springer <kspringer@innovateteam.com> wrote:
I've been adding a lot of hosts to our system using the XymonPSclient which sends it's data over port 443 to xymoncgimsg.cgi I've got 1 hosts who's payload is over a Meg and the logs seem to indicate that it's timing out before the payload completes it's send. The client logs say this. body length 1075939, timeout 100000ms
I know 100000 ms is nearly 2 minutes, but I've already got the Xymon server set to accept very large payloads and have never had this problem before. And it's only this 1 host that has a large payload. Where would I look to expand the timeout limit? Is this a PS client script issue, an Apache server issue, or a xymoncgimsg.cgi config issue?
-- Kris Springer
Xymon mailing list -- xymon@xymon.com To unsubscribe send an email to xymon-leave@xymon.com
Xymon mailing list -- xymon@xymon.com To unsubscribe send an email to xymon-leave@xymon.com
participants (2)
-
Jeremy Laidman
-
Kris Springer