"Discarding timed-out partial msg" Error Messages
Hey all,
Lately, I've been seeing quite a few error messages show up in xymond indicating that it was discarding a timed-out partial message from some machine.
i.e.
Latest error messages: Discarding timed-out partial msg from X.X.X.X
They seem to be happening sporadically but more often than usual as of late. Maybe one or two every couple of days or so. They don't seem to be coming from the same machine/machines either.
Is this something I should be worried about? Are there any side-effects from this happening too much?
What are the causes of this happening? Any way to make it not happen as much?
Any ideas or advice is greatly appreciated!
Thanks!!
-- Matt Vander Werf
On Tue, November 24, 2015 6:29 am, Matt Vander Werf wrote:
Hey all,
Lately, I've been seeing quite a few error messages show up in xymond indicating that it was discarding a timed-out partial message from some machine.
i.e.
Latest error messages: Discarding timed-out partial msg from X.X.X.X
They seem to be happening sporadically but more often than usual as of late. Maybe one or two every couple of days or so. They don't seem to be coming from the same machine/machines either.
Is this something I should be worried about? Are there any side-effects from this happening too much?
What are the causes of this happening? Any way to make it not happen as much?
Any ideas or advice is greatly appreciated!
Thanks!!
Broadly speaking, this is a result of the entire message not making it in in the time allotted by xymond, which is 10s by default. It could be the result of network congestion issues or packet loss, slow sender performance, or slow xymon server performance.
A quick fix might be to increase the --timeout= option to xymond to something like 15 or 20s.
If a netstat shows tons of simultaneous connections, you could also increase --lqueue= to 768 or 1024.
Are there any patterns on the clients/senders that are affected? Unusually huge messages being sent over slow connections?
If there isn't a network issue per se, and there are no local network errors (or you're seeing the reports about messages from all over the place), then it's time to look at network performance tuning on the xymon box. Consider the various tcp* options via sysctl (recycle and reuse in particular). If xymonnet is running on the same system (and you're doing) high concurrency testing, be sure to increase your ip_local_port_range for outbound connections.
http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/ is a nice resource for that.
HTH,
-jc
participants (2)
-
cleaver@terabithia.org
-
matt1299@gmail.com