0
import ssl
import socket

ssl_context = ssl.create_default_context()
target = 'swapi.co' 
port = 443 
resource = '/api/people/1/'
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 

secure_client = ssl_context.wrap_socket(client, server_hostname=target)
send_str = 'GET {} HTTP/1.1\r\nHost: {}:{}\r\n\r\n'.format(resource, target, str(port))

secure_client.connect((target, port))
secure_client.send(send_str.encode()) 
print(send_str)

print(len(secure_client.recv(8192))) # 1282
print(len(secure_client.recv(8192))) # 5. Why?

Above is a simple Python program that sends an HTTP request to Star Wars API using TCP sockets.

This is the request sent:

GET /api/people/1/ HTTP/1.1
Host: swapi.co:443

The response header has Transfer-Encoding: chunked in it. When the first recv is executed the header and the first chunk is obtained. However, to get the last chunk with terminator sequence ("0\r\n\r\n"), a second recv must be called. What is the underlying cause of this behavior?

Sıddık Açıl
  • 957
  • 8
  • 18

2 Answers2

1

TCP is a protocol that provides a stream of bytes. It doesn't provide any way to "glue" bytes together into messages. The actual number of bytes you will receive when you call recv is arbitrary and will depend on all kinds of factors that vary such as the exact implementation of the other side, how quickly you got around to calling recv, the network's maximum message size, and so on. It doesn't mean anything.

Since you indicated in your query that you support HTTP version 1.1, the server is permitted to use any encoding HTTP 1.1 clients are required to support. That includes this form of chunked encoding which uses one or more "chunks" of data, each preceded by a size indicator. This is convenient for cases where the output is generated by a script and the server won't know how big it is until the entire response is generated. This encoding scheme allows sending to begin immediately.

Don't claim HTTP 1.1 compliance in an HTTP query unless your code supports everything the HTTP 1.1 standard says a client "MUST" support.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • I do not claim HTTP 1.1 compliance in any sense. I know how chunked encoding chunks responses for which the size is not known beforehand. What I was asking is that how this non overlapping behavior of chunks is achieved on implementation level? – Sıddık Açıl Jul 12 '19 at 17:34
  • You claim HTTP 1.1 compliance by sending an "HTTP/1.1" in the query. That's what that is. What "non overlapping behavior of chunks" are you talking about? You mean that you happened to receive the last chunk in a separate call to `recv`. My first paragraph explains that. – David Schwartz Jul 12 '19 at 17:35
  • Well, I **happened to** receive the last chunk on a seperate `recv` every time I execute this. Tried this on swapi.co and www.google.com just to be sure. It was just coincidence each time? – Sıddık Açıl Jul 12 '19 at 17:40
  • 1
    @SıddıkAçıl In a sense yes and in a sense no. It's likely Nagle's algorithm interacting with the implementation of their web server which likely uses a separate code path to send the final chunk. But it's coincidence in the sense that they could upgrade their web server tomorrow and it could change. Or a packet could get dropped one time you try it and it could change. – David Schwartz Jul 12 '19 at 17:44
  • I gather that this behavior **is not a standard** in any sense of the word but something that can be observed production grade HTTP servers. I will try and look into open source HTTP server implementations for chunked encoding. Thank you for your help. Cheers. – Sıddık Açıl Jul 12 '19 at 17:54
  • 1
    @SıddıkAçıl It wasn't engineering by anyone. It's just the way various pieces just happen to come together most of the time. All it would take would be a code change to the HTTP server, a slight delay on the HTTP client due to an interrupt, or a packet to drop on the network and the behavior could change. – David Schwartz Jul 12 '19 at 17:55
  • Just to clarify for future visitors of this question, by "this behavior" in the previous comment, I meant the possibility of an http server executing `send` on a different `codepath` for the last chunk. The situation described above could change depending on the server, client or network status as clearly stated by @David Schwartz. It is not something to rely on when handling responses with `Transfer-Encoding: chunked`. – Sıddık Açıl Jul 12 '19 at 18:13
0

It's because In chunked transfer encoding, the data stream is divided into a series of non-overlapping "chunks". The chunks are sent out and received independently of one another.