1

I know there are a lot of questions and also answers around the TIME_WAIT state of a socket connection, but somehow non of them (or maybe from an experts perspective all of them) help me to understand my problem (or find possible solutions).

My scenario is, that I have a server facing the internet, which has to handle thousands of connections per second (it provides some type of API). I observe, that connections to the server are generally possible, but the time_connect and the time_total (using curl) are reaching from 0.009 to 0.526 for the former and 0.134 to 0.926 for the latter. I also observe a massive amount of connections in TIME_WAIT state - around 32,000+.

So my first question is, how many concurrent connections (with a default configured Debian distribution) can a server handle within a second, a minute or an hour. Is there some "simple" formula to calculate the logical limit of possible concurrent connections.

Furthermore, in the future I may not just handle an incoming connection directly on the server itself, instead I may have to forward it to a proxy, get the result and return it. I'm even thinking of utilizing the load-balancing mechanism of nginx (if needed). The second questions is therefore, having such a proxy/load-balancing/forwarding in place, how would the formula have to be modified?

My last question would be, what alternatives are there to increase the number of concurrent connections, e.g.,

  • add another server for the domain (that would double the amount of concurrent connections correct?),
  • decrease the time spent in TIME_WAIT (would that help and how),
  • ...

Thanks a lot for any help, or any reference!

Philipp
  • 111
  • 3
  • so *why* are you so hung up on TIME_WAIT? And what makes you ponder on concurrent connection limits when you think you are facing a scalability problem? Did you try a different TCP stack implementation instead (e.g. from a more recent 4.1+ Kernel)? – the-wabbit Aug 30 '17 at 21:40
  • Hi @the-wabbit thanks for the comment! I'm so hung up on the TIME_WAIT because I read in a lot of answers that reducing it would increase the numbers of possible concurrent connections. No I did not try a different stack, because I first wanted to understand the limitations of the default things I have in place. And understand what should be possible with a default debian implementation and what effects the limitations (e.g., TIME_WAIT?). – Philipp Aug 30 '17 at 21:43
  • 1
    connections in TIME_WAIT state would hog on your resources, as each of them is a TCB entry in the TCP stack's implementation, additionally you might be seeing scalability issues when TCB-related tables are getting too large and the algorithms used to do lookups or modifications in these tables do not scale well with large entry numbers. But to get the numbers in proportion, having "thousands of connections per second", your 32K of TIME_WAIT entries do not seem all that excessive. Your clients might help the situation by sending RST upon connection termination, though. – the-wabbit Aug 30 '17 at 21:53
  • 1
    You might find some interesting detail on how connection data is held in Linux Kernel memory in this article: https://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux – the-wabbit Aug 30 '17 at 21:58
  • @the-wabbit, thank you so much for the information and especially the latter link! I'm right now writing down all the information and doing some testing on my server(s). It looks promising without even the need to reduce the `TIME_WAIT`. I'll write an answer, or edit the question, when I got more insights! – Philipp Aug 31 '17 at 16:40

0 Answers0