How does HTTP/2.0 multiplexing work with TCP?

Question

I'm not an expert network engineer so I hope my question does not seem vague or naïve.

Multiplexing in HTTP/2.0 seems to utilise a single TCP connection for multiple/different requests concurrently so that we avoid head-of-line blocking problem. I was wondering how that works/overlaps with underlying TCP connection in the sense data reassembling.

TCP also makes sure the data (D) received on the receiver side is reconstructed even the packets that constitute D received out of order (or lost) to build up D back on the receiving side and then hand it over the application.

My question is: How can the notion of frames in HTTP/2.0 fit over/with TCP packet reassembling to make up a whole message on the receiving side? Which one takes place first? Or, what kind of mapping exists between frames and packets (one-to-one, one-to-many, etc.)? In a nutshell, how do they work together?

score 5 · Accepted Answer · edited Dec 21 '22 at 10:08

HTTP/2 packets are sent as one or more TCP packets. In the same way as TCP packets are ultimately sent as IP packets.

This does mean that even though HTTP/2 has multiplexing at the application layer (HTTP) it does not have truly independent streams at a transport layer (TCP), and one issue of HTTP/2 is we have just moved the head of line (HOL) blocking problem from the HTTP layer to the TCP layer.

Let’s look at an example: an example web page needs to download 10 images to display.

Under HTTP/1.1 the browser would open a TCP connection, fire off the first request, and then be stuck as it cannot use that TCP connection to make subsequent requests. This despite the fact that the TCP connection is doing nothing until it gets a response and there nothing stopping it at a TCP layer. It was purely a HTTP restriction and primarily due to the fact that HTTP/1 was text based so mixing up bits of requests wasn’t possible. HTTP/1.1 did have the concept of HTTP pipelining which allowed subsequent requests to be sent, but they still had to come back in order. And it was very poorly supported. Instead, as a workaround, browsers opened multiple connections (typically 6) but that had many downsides too (slow to create, to get up to speed and not possible to prioritise across them).

HTTP/2 allows those subsequent requests to be sent on the same TCP connection and then to receive bits of all the requests back in any order and piece them together for processing. So the first image requested might actually be the last received. This is especially useful for slow connections (where the delay in sending is a significant chunk of the total time taken) or when the server might take a while processing some requests compared to others (e.g. if the first image has to be fetched from disk but the second is already available in a cache, then why not use the connection to send that second image). This is why HTTP/2 is generally faster and better than HTTP/1.1 - because it uses the TCP connection better and is not as wasteful.

However, because TCP is a guaranteed, in-order, protocol, that has no idea what the higher level application (HTTP) is using it for, this does introduce some problems to HTTP/2 if a TCP packet gets lost.

Let’s say those 10 images all come back in order. But a packet from the first image is lost. In theory, if HTTP/2 was truly made up of independent streams, the browser could display the last 9 images and then re-request the missing TCP packet and then display the first image. Instead what happens is all 10 images are held up waiting for that missing TCP packet to be resent before TCP let’s the upper layer HTTP know which messages have been received.

So in a lossy environment, HTTP/2 performs significantly worse than HTTP/1.1 with 6 connections.

This was all known at the time HTTP/2 was being created but, in most cases, HTTP/2 was faster so they released it anyway until they could work on fixing that case.

HTTP/3 looks to solve this remaining case. It does this by moving away from TCP to a new protocol called QUIC which is made with the idea of multiplexing built into it, unlike TCP. QUIC is built upon UDP rather than try to create a whole new low level protocol as that is well supported. But QUIC is very complicated and will take a while to get here, which is why they did not hold up HTTP/2 to have that and instead released what they have as a step along the way.

I was thinking that 10 requests are entirely independent of each other meaning that each request is like it has it's own sub tcp connection in a way so the TCP can identify them one by one. But apparently, that will come with HTTP/3. — stdout, Mar 30 '20 at 12:28
Correct. They are independent of each other at HTTP layer but not at TCP layer. Though even that’s not 100% true as the server can prioritise across streams to decide which order to send the packets in - but that’s the good side of dependencies. — Barry Pollard, Mar 30 '20 at 12:49
So, to the TCP, all the requests (over the same connection) are like one. If one gets lost, it's a blocker. And, If I want to reuse or pool the connection after I'm done, then probably I need to set a flag on that TCP connection to reset it. — stdout, Mar 30 '20 at 13:14
HTTP/2 connections are kept-alive by default. The spec is a little vague on this saying " HTTP/2 connections are persistent. For best performance, it is expected that clients will not close connections until it is determined that no further communication with a server is necessary (for example, when a user navigates away from a particular web page) or until the server closes the connection." https://tools.ietf.org/html/rfc7540#section-9.1 — Barry Pollard, Mar 30 '20 at 13:27
Great! One thing I don't still quite get. If the underlying TCP connection only knows about TCP frames, how does the TCP socket on the receiving side (for instance, the browser) knows whether the entire request (including the sub-requests for images, CSS etc.) is complete or not so that it can keep blocking? Do the `read` calls have anything to do with it? It feels like "independent" stream is not really "truly" independent indeed because they all go under a single `read` call. — stdout, Apr 01 '20 at 10:05
As long as the TCP packets are received in order the TCP stack can release them to the application (the browser in this case) and it sees them as a stream of bytes. So the browser sees the streams as independent even though they aren’t 100% independent. Many resources (e.g. HTML, progressive JPEGs) can be processed before the full file is received. This was true under HTTP/1 and HTTP/2. Other resources (e.g. JavaScript, CSS, non-Progressive JPEGs) can’t really be processed in this way and the browser just reads and thenbuffers the bytes until it has the full file. — Barry Pollard, Apr 01 '20 at 10:14
I’m HTTP/1 it was a double newline which defined the end of the Headers and then the content-length header which defined the length of the Body. In HTTP/2 there is also an END_STREAM flag which states if this is the last HTTP/2 packet for that stream. — Barry Pollard, Apr 01 '20 at 10:17
I can recommend a good book on this subject :-) https://www.manning.com/books/http2-in-action — Barry Pollard, Apr 01 '20 at 10:18

score 3 · Answer 2 · answered Mar 29 '20 at 16:30

A series of HTTP/2 frames, belonging to the same stream or to different streams does not matter, it's just a series of bytes.

TCP does not interpret those bytes. The TCP sender just packs the bytes into TCP frames and sends them along. The TCP receiver receives the TCP frames and reassembles the bytes that happen to form a series of HTTP/2 frames.

TCP and HTTP/2 don't really work together in the sense that TCP is not aware of what it is transporting - it's just a series of opaque bytes.

As such there is no mapping between TCP frames and HTTP/2 frames.

Consider that in most cases HTTP/2 is encrypted so you have TCP transporting opaque bytes that happen to be TLS frames bytes (possibly fragmented - i.e. a TCP frame may contain 1.5 TLS frames and the remaining TLS frame bytes be in a subsequent TCP frame); each TLS frame contains opaque bytes that happen to be HTTP/2 frames bytes (possibly fragmented as well).

So each layer is responsible for keeping track of its own TCP/HTTP frames (per request) so that they can reassemble them on the receiving side? — stdout, Mar 30 '20 at 12:12
Correct, each layer packs its frames, and the lower layers just interprets these packed frames from the upper layer as an opaque stream of bytes. — sbordet, Mar 30 '20 at 14:07

How does HTTP/2.0 multiplexing work with TCP?

2 Answers2