TIME_WAIT making client report too big
I have a couple of servers that I just started monitoring. These
servers exist to run haproxy, a load balancer. The haproxy program
tends to leave a lot of TIME_WAIT connections in the netstat output.
This is resulting in a client report that is too big, and I am getting
false red alarms on procs, because it can't find the cron process.
Is there any way to have specific clients filter out TIME_WAIT before
sending in their reports? If so, can it be controlled from the server?
I have also been looking for ways to make haproxy clean up after itself
better, but that so far is a dead end.
The client is CentOS 6 running 4.3.7 from a package, the server is running 4.3.0 beta using packages in the debian repository. I know I need to get it upgraded. It's on my todo list, but because of the hobbit->xymon change throughout the program, upgrading is not going to be pretty, and I keep putting it off.
Thanks, Shawn
On 4/13/2012 12:48 PM, Shawn Heisey wrote:
I have a couple of servers that I just started monitoring. These servers exist to run haproxy, a load balancer. The haproxy program tends to leave a lot of TIME_WAIT connections in the netstat output.
This is resulting in a client report that is too big, and I am getting false red alarms on procs, because it can't find the cron process.
I got a couple of off-list replies, one of which pointed me at a page about TCP disruption problems due to time_wait. I am not having a problem with the server, or with the haproxy process that is generating the time_wait entries. There are not enough of them to actually cause network disruption. The only thing having a problem here is xymon.
I did find a fairly non-disruptive way to deal with it - I bumped up the max message sizes on the server. So far the false red alarm has not come back. I discovered that a few of my other hosts were having their reports truncated as well, though I did not know it until today.
Thanks, Shawn
On 13-04-2012 20:48, Shawn Heisey wrote:
I have a couple of servers that I just started monitoring. These servers exist to run haproxy, a load balancer. The haproxy program tends to leave a lot of TIME_WAIT connections in the netstat output. This is resulting in a client report that is too big, and I am getting false red alarms on procs, because it can't find the cron process.
Is there any way to have specific clients filter out TIME_WAIT before sending in their reports? If so, can it be controlled from the server?
You can't do it on the Xymon server, because the data is truncated before reaching the Xymon server.
But you can edit the xymonclient-<OSNAME>.sh script running on the haproxy-server. Just change the "netstat" command and add a "| grep -v TIME_WAIT" to filter out the time-wait entries in the netstat output.
Regards, Henrik
participants (2)
-
henrik@hswn.dk
-
hobbit@elyograg.org