2

I am trying to understand the TCP reset problem mentioned in RFC 7230: HTTP/1.1 Message Syntax and Routing, § 6.6:

6.6. Tear-down

The Connection header field (Section 6.1) provides a "close" connection option that a sender SHOULD send when it wishes to close the connection after the current request/response pair.

So HTTP/1.1 has persistent connections, meaning that multiple HTTP request/response pairs can be sent on the same connection.

A client that sends a "close" connection option MUST NOT send further requests on that connection (after the one containing "close") and MUST close the connection after reading the final response message corresponding to this request.

A server that receives a "close" connection option MUST initiate a close of the connection (see below) after it sends the final response to the request that contained "close". The server SHOULD send a "close" connection option in its final response on that connection. The server MUST NOT process any further requests received on that connection.

So the client signals that it will close the connection by adding the Connection: close header field to the last HTTP request, and it closes the connection only after it receives the HTTP response acknowledging that the server received the request.

A server that sends a "close" connection option MUST initiate a close of the connection (see below) after it sends the response containing "close". The server MUST NOT process any further requests received on that connection.

A client that receives a "close" connection option MUST cease sending requests on that connection and close the connection after reading the response message containing the "close"; if additional pipelined requests had been sent on the connection, the client SHOULD NOT assume that they will be processed by the server.

So the server signals that it will close the connection by adding the Connection: close header field to the last HTTP response, and it closes the connection. But it closes the connection only after receiving which message acknowledging that the client received the HTTP response?

If a server performs an immediate close of a TCP connection, there is a significant risk that the client will not be able to read the last HTTP response. If the server receives additional data from the client on a fully closed connection, such as another request that was sent by the client before receiving the server's response, the server's TCP stack will send a reset packet to the client; unfortunately, the reset packet might erase the client's unacknowledged input buffers before they can be read and interpreted by the client's HTTP parser.

So in the case where the server initiates the close of the connection, if the server fully closes the connection right after sending the HTTP response with a Connection: close header field to an initial HTTP request, then the client may not receive that HTTP response because it received a TCP reset packet response to a subsequent HTTP request that it sent after the initial HTTP request. But how can the TCP reset packet response to the subsequent HTTP request precede the HTTP response to the initial HTTP request?

To avoid the TCP reset problem, servers typically close a connection in stages. First, the server performs a half-close by closing only the write side of the read/write connection. The server then continues to read from the connection until it receives a corresponding close by the client, or until the server is reasonably certain that its own TCP stack has received the client's acknowledgement of the packet(s) containing the server's last response. Finally, the server fully closes the connection.

So in the case where the server initiates the close of the connection, the server only closes the write side of the connection right after sending the HTTP response with a Connection: close header field to an initial HTTP request, and it closes the read side of the connection only after receiving a subsequent corresponding HTTP request with a Connection: close header field or after waiting for a period long enough to assume that it received a TCP message acknowledging that the client received the HTTP response. But why would the client send a subsequent corresponding HTTP request with a Connection: close header field after receiving the HTTP response with a Connection: close header field, whereas paragraph 5 states: ‘A client that receives a "close" connection option MUST cease sending requests on that connection’?

It is unknown whether the reset problem is exclusive to TCP or might also be found in other transport connection protocols.

Community
  • 1
  • 1
Géry Ogam
  • 6,336
  • 4
  • 38
  • 67
  • It's a strange recommendation, and it doesn't solve any problem. (1) TCP RST is exactly what the client should get if it doesn't implement any application protocol correctly: it isn't the function of an RFC to evade that; and (2) unless the server actually reads whatever the client mistakenly sends after the first request the RST will still happen. – user207421 Mar 11 '21 at 00:06
  • @user207421 HTTP/1.1 allows *pipelining* (cf. [RFC 7230, § 6.3.2](https://tools.ietf.org/html/rfc7230#section-6.3.2)), so technically the client does not violate the application protocol by sending another request before receiving the response to the first request. So if the server wishes to close the connection after the response to the first request, it should wait a little bit to let the client read the response and avoid that its socket buffer be erased before by a received TCP RST packet sent by the server after receiving the other pipelined request. In other words, it avoids data loss. – Géry Ogam Mar 11 '21 at 11:37
  • It can't prevent data loss, as the server can't know how long to wait. It will mitigate it. The reset will still happen. To prevent data loss completely the server would need to read until end of stream is receivied, indicating that the client has closed the stream: but the second request will still be lost. Or maybe the server shouldn't close unless the client requested it. – user207421 Mar 11 '21 at 22:20
  • @user207421 “It can't prevent data loss, as the server can't know how long to wait.” According to [Steffen Ullrich’s answer](https://stackoverflow.com/a/66573249/2326961), the server can wait until the client closes the connection, which the client will do after receiving the HTTP close response. So the connection close is a client acknowledgment for the server, avoiding the TCP RST packet to be sent so preventing data loss (of the HTTP close response). – Géry Ogam Mar 12 '21 at 12:12
  • @user207421 “but the second request will still be lost” Yes but this loss is not a problem since the server will *not* process that request (potentially modifying the target resource’s state), so the client can safely sent it again after reconnecting. In other words, you cannot lose a request since the client can always send it again, you can only lose a response to a *processed* request since the response depends on the target resource’s state on the server which might be modified with the processing of the request. – Géry Ogam Mar 12 '21 at 12:13
  • @user207421 So more precisely, you can only lose a response to a processed *non-idempotent* request (i.e. a processed request which will modify the target resource’s state again in a future processing and therefore result in a different response than the original one). That is why [RFC 7230, § 6.3.2](https://tools.ietf.org/html/rfc7230#section-6.3.2) states that a client should not pipeline requests after a non-idempotent request until the response for that request has been received. – Géry Ogam Mar 12 '21 at 12:20

1 Answers1

5

But why would the client send a subsequent corresponding HTTP request with a Connection: close header field after receiving the HTTP response with a Connection: close header field, whereas paragraph 5 states: ‘A client that receives a "close" connection option MUST cease sending requests on that connection’?

With HTTP pipelining the client can send new requests even though the response for a previous request (and thus the Connection: close in this response) was not yet received. This is a slight optimization from only sending the next request after the response for the previous one was received, but it comes with the risk that this new request will not be processed by the server.

But how can the TCP reset packet response to the subsequent HTTP request precede the HTTP response to the initial HTTP request?

While the TCP RST will be send after the response it will be propagated early to the application. A TCP RST is sent if new data arrive at a socket which is already shut down for at least reading (i.e. close(fd) or shutdown(fd, SHUT_RD)). It will also be sent if there are still unprocessed data in the receive buffer of the socket on shutdown, i.e. like in the case of HTTP pipelining. Once a TCP RST is received by the peer, its socket will be marked as broken. On the next system call with this socket (i.e. typically a read or write) this error then will be delivered to the application—no matter if there would be still unread data in the receive buffer of the socket. These unread data are thus lost.

But it closes the connection only after receiving which message acknowledging that the client received the HTTP response?

It is not waiting for some application message from the client. It will first deliver the response with the Connection: close, then read on the socket in order to determine the close of the connection by the client. Then it will also close the connection. This waiting for close should of course be done with a short timeout, because disrupted connections might cause connections to never be explicitly closed. Alternatively it could just wait some seconds and hope that the client got and processed the response in the mean time.

Géry Ogam
  • 6,336
  • 4
  • 38
  • 67
Steffen Ullrich
  • 114,247
  • 10
  • 131
  • 172
  • Thank you! “With HTTP pipelining the client can send new requests even though the response for a previous request ([…]) was not yet received.” Yes but the RFC talks about “a corresponding close by the client”, meaning that the client **first** receives a close response and **then** sends a **corresponding** close request. This is confusing since: a) in HTTP the communication is always initiated by the client (request then response), not by the server (response then request); b) the RFC states: “A client that receives a "close" connection option MUST cease sending requests on that connection”. – Géry Ogam Mar 11 '21 at 10:04
  • “TCP RST is not some kind of application data which is delivered in order after the previous data.” Do you mean that the server may send **concurrently** a TCP RST packet in between the TCP packets for the HTTP close response? Or do you mean that the server always sends **sequentially** a TCP RST packet **after** the TCP packets for the HTTP close response, but the HTTP close response may still be in the client’s socket buffer (so not read yet) when the TCP RST packet arrives and therefore be erased? – Géry Ogam Mar 11 '21 at 11:44
  • 1
    @Maggyero: A TCP RST will be send in order but not processed in order by the application. As soon as a RST is received by the kernel it will mark the socket as broken, no matter if there are still unread data in the receive buffer. – Steffen Ullrich Mar 11 '21 at 11:50
  • 1
    @Maggyero: *"meaning that the client first receives a close response and then sends a corresponding close request"* - No. The client does not send a "close request". It just closes the socket after the last response was read so there is a **socket** close corresponding to the close **request**. – Steffen Ullrich Mar 11 '21 at 11:51
  • “A TCP RST will be send in order but not processed in order by the application.” Okay, now I see. – Géry Ogam Mar 11 '21 at 13:55
  • “It just closes the socket after the last response was read so there is a **socket** close corresponding to the close **request**.” I think you meant “corresponding to the close **response**”. Okay, that clears my confusion. Does it mean that the server is always **certain** if the client received the close response, thanks to the following client’s socket close? If so, what is the point of the RFC **wishful** alternative: “or until the server is reasonably certain that its own TCP stack has received the client's acknowledgement of the packet(s) containing the server's last response.”? – Géry Ogam Mar 11 '21 at 13:59
  • 1
    @Maggyero: yes, this was a typo. The server can detect if the client has closed the socket, because a read on the socket will fail in this case. – Steffen Ullrich Mar 11 '21 at 14:08
  • Okay, so since the server has this detection means (a kind of client acknowledgment), what is the point of the RFC wishful alternative (the timeout)? – Géry Ogam Mar 11 '21 at 14:15
  • 1
    @Maggyero: If the server will read on the connection to detect the close as recommended depends on the design of the server. Therefore there is an alternative way to proceed. Also, it should not wait forever since on connection disruption no close will be detected. – Steffen Ullrich Mar 11 '21 at 14:43
  • Thank you Steffen for the clarifications, everything makes sense now! Could you add them to the answer (for the future reader)? I am going to accept it. – Géry Ogam Mar 11 '21 at 14:49
  • As we had a fruitful interaction in this thread, can I request your networking expertise again for checking the answer that I posted [here](https://serverfault.com/a/1055695/361133) about the fundamental difference between a *proxy* and a *gateway* (two of the three *intermediaries* defined in RFC 7230)? I am not sure that I interpreted the RFC correctly and I could not find a clear interpretation on the web. If you wish I can open a new post on this so that I can give you the reputation points. – Géry Ogam Mar 12 '21 at 17:46