Connection Timeout Issues - Solaris 11 SPARC

Question

We're currently running some performance tests using Solaris 11 (SPARC) on some large hardware. The tests, which consist of sending SOAP requests (50kb per request), are running well up until we get into the multiple of thousands of users i.e 30,000 users, where at ~ 2 minutes in we start seeing a number of Connection Time-out errors in the logs. CPU usage and memory usage is low, being no more that 15% at any time. We are using WebLogic 11g and Oracle HTTP Server.

I have adjusted the following TCP parameters, however they don't seem to have made any significant difference:

_conn_req_max_q  = 262144 (also tried 16384)
_conn_req_max_q0 = 16384 (also tried 4096 - increased to remove tcpListenDrop0 being above 0)
_time_wait_interval = 15000

I also added the following to /etc/system:

set ip:ipcl_conn_hash_size=16834

Running netstat -sP tcp at the end of the test (server rebooted before test started) results in:

TCP tcpRtoAlgorithm     =     4     tcpRtoMin           =   200
    tcpRtoMax           = 60000     tcpMaxConn          =    -1
    tcpActiveOpens      =133886     tcpPassiveOpens     =584461
    tcpAttemptFails     =102899     tcpEstabResets      =553474
    tcpCurrEstab        =   339     tcpOutSegs          =35235864
    tcpOutDataSegs      =20302930   tcpOutDataBytes     =842489656
    tcpRetransSegs      = 92070     tcpRetransBytes     =337976
    tcpOutAck           =2044606    tcpOutAckDelayed    =252534
    tcpOutUrg           =     0     tcpOutWinUpdate     =     0
    tcpOutWinProbe      =     0     tcpOutControl       =901262
    tcpOutRsts          = 29486     tcpOutFastRetrans   =     0
    tcpInSegs           =39352489
    tcpInAckSegs        =     0     tcpInAckBytes       =2742139410
    tcpInDupAck         = 32470     tcpInAckUnsent      =     0
    tcpInInorderSegs    =15010534   tcpInInorderBytes   =1321218448
    tcpInUnorderSegs    =  1515     tcpInUnorderBytes   =2008280
    tcpInDupSegs        = 47362     tcpInDupBytes       =160101
    tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
    tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
    tcpInWinProbe       =     0     tcpInWinUpdate      =     0
    tcpInClosed         =  1099     tcpRttNoUpdate      =   425
    tcpRttUpdate        =11258426   tcpTimRetrans       =194800
    tcpTimRetransDrop   =     4     tcpTimKeepalive     =     0
    tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
    tcpListenDrop       =300269     tcpListenDropQ0     =     0
    tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     7

The tcpListenDrop value is still quite high, but that increases before we start seeing the errors in the logs, so it may be unrelated, I am not sure. Are there any other (TCP) parameters worth tuning to try and reduce the number of errors we are seeing? If not, any recommended way to diagnose this kind of issue?

Running out of network connections in the system? or number of epherimal port ids? — mdpc, Aug 28 '14 at 06:40
Thanks mdpc, running out of network connections is a possibility in which I will look into some more. I don't think it is an epherimal port ID exhaustion, as I would expect 'Address in Use' or a similar error to be reported instead of 'Connection Timed Out'. — user3474444, Aug 29 '14 at 00:34
I figured this out today, Oracle HTTP Server has a default ListenBackLog of 511 pending connections. We were exhausting MaxClients which pushed the connections into the ListenBackLog queue, which of course when we exceeded this queue, the Connection Timed Out errors started occurring. Thanks again mdpc, you gave me some ideas to think over which helped me figure out the eventual solution. — user3474444, Aug 29 '14 at 04:40

Connection Timeout Issues - Solaris 11 SPARC

0 Answers0