On 4/13/2012 12:48 PM, Shawn Heisey wrote:
I have a couple of servers that I just started monitoring. These servers exist to run haproxy, a load balancer. The haproxy program tends to leave a lot of TIME_WAIT connections in the netstat output.
This is resulting in a client report that is too big, and I am getting false red alarms on procs, because it can't find the cron process.
I got a couple of off-list replies, one of which pointed me at a page about TCP disruption problems due to time_wait. I am not having a problem with the server, or with the haproxy process that is generating the time_wait entries. There are not enough of them to actually cause network disruption. The only thing having a problem here is xymon.
I did find a fairly non-disruptive way to deal with it - I bumped up the max message sizes on the server. So far the false red alarm has not come back. I discovered that a few of my other hosts were having their reports truncated as well, though I did not know it until today.
Thanks, Shawn