Gautier Begin wrote:
Andy,
I'm using Solaris 10.5 in a cluster zone configuration. Both the main and the proxy server. I have also a little proxy under Linux Ubuntu. XYMON version 4.3.12
Now, my proxy under Solaris is working fine with ~900 targets. Here are the different stepsI have done:
*0- Use a tool to observe the behaviour of the network* on the system. I used netstat on the zone and lsof -i :1984 on the global zone (physical node of the cluster)
Here my perl script to be run on the zone (netstat):
/$total = 0 ;/ /$big_total = 0 ;/ /@netstat =
netstat -naP tcp;/ /my %Con_Status ;/ /my %Con_Status_Total ;/ /foreach $ln (@netstat)/ /{/ / chomp($ln) ;/ / @elts = split(/ +/,$ln) ;/ / if (( $#elts > 5 ) && ( $ln =~ /[0-9]+.*[A-Z]+/))/ / {/ / $big_total++ ;/ / unless ( exists($Con_Status_Total{$elts[$#elts]}) )/ / {/ / $Con_Status_Total{$elts[$#elts]} = 1 ;/ / } else {/ / $Con_Status_Total{$elts[$#elts]} = $Con_Status_Total{$elts[$#elts]} + 1 ;/ / }// }/
/ if ( $ln =~ /\.1984 +/ )/ / {/
/ unless ( exists($Con_Status{$elts[$#elts]}) )/ / {/ / $Con_Status{$elts[$#elts]} = 1 ;/ / } else {/ / $Con_Status{$elts[$#elts]} = $Con_Status{$elts[$#elts]} + 1 ;/ / }/
/ }/
/}/
/print " State\t\tPort 1984\tTotal\n=======================================\n" ;/ /foreach $Conn_State (sort keys %Con_Status_Total )/ /{/ / unless ( exists($Con_Status{$Conn_State}) ) { $Con_Status{$Conn_State} = 0 ; }/ / if ( length($Conn_State) < 7 ) { $col = "\t\t" ; } else { $col = "\t" ; }/ / print " $Conn_State$col$Con_Status{$Conn_State}\t\t$Con_Status_Total{$Conn_State}\n" ;/ / $total = $total + $Con_Status{$Conn_State} ;/ /}/ /print "=======================================\n TOTAL\t\t$total\t\t$big_total\n" ;/
*1- Tune and configure how Solaris manages the network *using the ndd command:
/ndd -set /dev/tcp tcp_time_wait_interval 2000/ /ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500/ /ndd -set /dev/tcp tcp_ip_abort_interval 300000/ /ndd -set /dev/tcp tcp_keepalive_interval 7200000/ /ndd -set /dev/tcp tcp_rexmit_interval_max 4000/ /ndd -set /dev/tcp tcp_rexmit_interval_min 3000/ /ndd -set /dev/tcp tcp_rexmit_interval_initial 3000/ /ndd -set /dev/tcp tcp_smallest_anon_port 1024/
/ndd -set /dev/tcp tcp_conn_req_max_q 2048/ /ndd -set /dev/tcp tcp_conn_req_max_q0 4096/ /ndd -set /dev/tcp tcp_slow_start_initial 4/
/ndd -set /dev/tcp tcp_xmit_hiwat 262144/ /ndd -set /dev/tcp tcp_recv_hiwat 262144/ /ndd -set /dev/tcp tcp_max_buf 1048576/
*2- Modify the program xymonproxy.c*
As I previously said, sockets are not well handled in this program (closure not managed). Because I know very few about C programming, I just "arranged" the program, but it's remain a dirty solution. => so_linger, setsockopt part
I modified also line 973 and following because of verbose logging slowing done the proxy (select failed message). The best should be to solve to issue but I didn't.
/# diff xymonproxy.c xymonproxy.c.ORIG/ /230d229/ /< struct linger so_linger;/ /715,717d713/ /< so_linger.l_onoff = 0;/ /< so_linger.l_linger = 10;/ /< setsockopt(cwalk->ssocket, SOL_SOCKET, SO_LINGER, &so_linger, sizeof(so_linger));/
/977,981c973,976/ /< /* if (n < 0) {
*// /< /* errprintf("select() %d/%d failed: %s\n", n, maxfd, strerror(errno)); *// /< /* }
*// /< /* else if (n == 0) {
*// /< if (n == 0) {/ /---/ /> if (n < 0) {/ /> errprintf("select() failed: %s\n", strerror(errno));/ /> }/ /> else if (n == 0) {/ /1001c996/ /< else if ( n > 0 ) {/ /---/ /> else {/*3- XYMON proxy conf*
Because of the large amount of targets:
In xymonserver.cfg, of the proxy, I put MAXMSGSPERCOMBO="500" .
In the xymonserver.cfg, of the main server, I put
MAXMSGSPERCOMBO="500"
MAXLINE="5242880" MAXMSG_CLIENT="5242880" MAXMSG_DATA="5242880" MAXMSG_STACHG="5242880" MAXMSG_STATUS="5242880" MAXMSG_NOTES="5242880" MAXMSG_PAGE="5242880" MAXMSG_ENADIS="5242880" MAXMSG_CLICHG="5242880"
This part is not realy tunned (figures should be too large) but it's working.
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
System Tools Team Lead CACEIS and APERAM accounts CSC Computer Sciences Luxembourg S.A. 12D Impasse Drosbach L-1882 Luxembourg
Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | gbegin at csc.com | www.csc.com
CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose • CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664
From: Andy Smith <abs at shadymint.com> To: xymon at xymon.com Date: 05/04/2014 02:50 PM Subject: Re: [Xymon] XYMON Proxy Issue Sent by: "Xymon" <xymon-bounces at xymon.com>
Hi,
In February, Gautier reported this issue with xymonproxy on Solaris :- _ __http://lists.xymon.com/pipermail/xymon/2014-February/039160.html_
I have come this week to update an installation of 4.2.3 on Solaris 9 and have encountered the exact same issue as Gautier, but this time on the latest 4.3.17 code :-
2014-05-04 13:05:36 xymonproxy version 4.3.17 starting 2014-05-04 13:20:41 Listening on _0.0.0.0:1984_ <http://0.0.0.0:1984/> 2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 Too many select failures, aborting 2014-05-04 13:20:46 xymonproxy version 4.3.17 starting
I do not suffer the connections in TIME_WAIT, just the constant restarting of the proxy every 15 minutes. Here is the truss as it gasps when falling over :-
poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206937 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206938 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206939 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206940 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206941 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206942 poll(0xFFBFF208, 1, 1000) = 1 accept(3, 0x0003AC60, 0xFFBFF310, 1) = 4 fcntl(4, F_SETFL, 0x00000080) = 0 time() = 1399206942 poll(0xFFBFF200, 2, 1000) = 1 read(4, " s t a t u s + 4 5 c s".., 8185) = 140 time() = 1399206942 poll(0xFFBFF200, 2, 1000) = 1 read(4, 0x00038CE2, 8045) = 0 time() = 1399206942 shutdown(4, 2, 1) = 0 close(4) = 0 poll(0xFFBFF208, 1, 1000) = 1 accept(3, 0x0003ACD0, 0xFFBFF310, 1) = 4 fcntl(4, F_SETFL, 0x00000080) = 0 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " T o o m a n y s e l".., 35) = 35 _exit(1)
So, question to Gautier, are you using Solaris 9 and have you managed to resolve this?
Another question to the rest of the list, this is actually the only proxy I have on Solaris, all the otehrs are on Redhat, is anyone else using xymonproxy on Solaris and if so, what version? For the time being, I am running the old bbproxy until I get this fixed, the rest of 4.3.17 seems to be working OK.
Thanks for any feedback.
Andy
Gautier,
My issue is not a matter of performance or resource, I have only 3 servers in this DMZ, but thanks for the complete information. Also, it is a concern that this still happens with recent versions of Solaris, I would be prepared to accept that Solaris 9 might behave incorrectly but I would have hoped that Solaris 10 might have fixed this.
Maybe I will go back to the differences between the code for bbproxy at 4.2.3 and xymonproxy at 4.3.17 for a clue as to what is going on.
-- Andy