Hi Kris
I'm not sure that a large message is the cause, but I wouldn't rule it out. The thing is, the client is talking to your webserver (Apache? I can't remember if you said what webserver you're running) and is not talking directly to Xymon[*], and we know that webservers are capable of accepting large uploads in the gigabytes, whereas your logs indicate a size around one megabyte (assuming the PS client "$body.Length" is in bytes). So my guess is that something else is going on other than this being simply a size issue. But to rule this out, I would try adjusting the client so that the [procs] content is small/empty, and see if you still have the problem.
Could it be a timeout issue caused by the size of message? Possibly, but I'd think this alone wouldn't be a problem. But perhaps some extreme rate limiting could extend the transfer time enough to cause a timeout? I would think this unlikely.
Have you checked your webserver logs to see if it's showing any problems? Also check the cgidebug.log file, written by all CGI
It occurs to me that the "timeout 100000ms" message could be ambiguous. To some, it might indicate that there was a timeout event. But I suspect it means simply, "I'm about to try sending the body that has length L and I have set my timeout to T" right before it connects to the remote end.
[*]: While the client talks directly to the webserver, that doesn't rule out the possibility that the CGI running in the webserver connects to Xymon at the start of the session, and it is indeed xymond that is telling xymoncgimsg that there's a problem. I haven't looked into how xymoncgimsg works, so it might be capturing the entire message before connecting to
127.0.0.0:1984 to deliver it, or it could be relaying blocks to 127.0:1984 as they're coming in from the client. If it's the former, then xymond isn't in play when the transfer is complete. I had a quick look at the source code for xymoncgimsg, and although it's only half a dozen lines, it's not entirely clear to me what it's doing (mostly because I'm not a programmer, but also because some of the detail are within functions elsewhere in the source). Nevertheless, it appears to be getting the entire message first, then sending it to xymond, then possibly sending any text from xymond back to the client (sections from clientlocal.cfg). Unfortunately, there is no --debug option for xymoncgimsg, so debugging would be limited to looking at webserver logs and (probably) xymond.log.
Interestingly, xymoncgimsg uses the XYMON_TIMEOUT constant, which is defined as 15 (seconds, probably), when sending data to xymond. This puts an upper limit on how long the CGI will wait for the message from the client. But it does mean that if there's a slow transfer, it's the CGI that will terminate the session after 15 seconds rather than the client at 100 seconds. If server-side logs (web, xymond) show the start and end of this server-side message passing, then you would be able to tell if the 15 second limit is being reached. If logs don't confirm this, you might be able to create a CGI wrapper that logs start and end timestamps, while calling the real CGI binary betwixt these. Something like (untested):
#!/bin/sh
# xymoncgimsg-wrapper.cgi
echo "`date` start" >> /tmp/msg.log
`dirname $0`/xymoncgimsg.cgi
echo "`date` end" >> /tmp/msg.log
You would put the wrapper in the same dir as xymoncgimsg.cgi, and adjust the client URL to call this alternate version.
I think that all of Xymon's CGI files in the cgi-bin directory are actually all the same wrapper binary, which then calls the real binary located elsewhere. That binary can be run in a debug mode, and log to the file cgidebug.log (in the same dir as other logfiles such as xymond.log). Reading the source if cgiwrap, it appears that debug mode can be enabled by setting CGIDEBUG in cgioptions.cfg. While the xymoncgimsg source doesn't invoke any debugging itself, the wrapper debugging might provide sufficient meta-data to provide some insight.
A few other things you could try:
* see if an almost empty client message causes the same problem, perhaps increasing the message size until failure
* try sending a captured message from a command line on the Windows server (eg using curl for Windows) as a post message, and see if you get the same results, possibly trying different content or parameters, until you find what triggers the fault
* use an http URL instead of an https URL, and then do a packet capture on port 80 to view the message content, conversation and timing
* do a packet capture on port 443 just to see how long the TCP session actually takes to complete
* recompile xymoncgimsg.cgi after replacing XYMON_TIMEOUT with 30 or 300, to see if it makes a difference.
* include strace in the wrapper (eg: strace `dirname $0`/xymoncgimsg.cgi 2>>/tmp/msg.log) and see what the CGI is actually doing if the connection is severed prematurely
I hope some of this helps you get it sorted out. It's a weird problem for sure.
Cheers
Jeremy