You tagged TCP but the included link mentions HTTP stuff. I'll assume that you either mean your question in an HTTP context or let's just assume a generic request/reply protocol on top of TCP.
Regarding this:
note that thread-per-request does not mean that the framework has to
close the TCP connection between HTTP request
The threading strategy for handling IO on the connections usually depends on the way you're doing IO. If you're doing blocking IO, you'll have to have (at least) one thread per connection. This means that you have at least one thread 99% of the time blocked on a read()
.
If you're in that case there's no need to pursue 1 thread per request unless you want to serve more than one request concurrently.
If that's the case, you need to spawn a new thread per request to handle
the request (i.e. producing the response). That new per-request thread is on top of what threads you're using to handle the IO (read/write) to the underlying connection. At a certain point when you have produced a response, you'll have to send it back to one the the threads doing IO. (Note in HTTP1.1 whilst the connection can be reused to send multiple requests, there can be only one outstanding request at the time on a single connection... so you ultimately do not need one thread per request if you're already doing 1 thread per connection). This is not true for HTTP2, which has multiplexing.
That's a lot of if
s to be in this case and make it worthwhile.
The problem with this is that creating a thread is a costly operation. It makes sense to do that only if producing a response takes a long time due to calculations (i.e. you're CPU-bound) or if or the act of producing a response requires blocking IO. But at that point... I would not be using blocking IO to deal with the connection in the first place (i.e. i would ditch the 1 thread <-> 1 connection idea.
My intuition is that you're conflating 2 different things:
- Doing actual IO (reading and writing from sockets).
- Performing the actual business logic of "handling" a particular message in your server and eventually producing a response.
Personally without knowing much before hand, a safe bet is to use something like Netty for the IO (multi threaded event loops for non-blocking IO) then offloading long or blocking request handling to a fixed-size thread-pool.
Blocking is not bad per-se, it's bad when it's a waste of OS resources. Disclaimer for the future: when project Loom will land on the JDK I think there will be a resurgence of use in blocking APIs and the practices in this space are going to change a bit.