How to handle chunked encoding in Python BaseHTTPRequestHandler?

Question

I have the following simple web server, utilizing Python's http module:

import http.server
import hashlib


class RequestHandler(http.server.BaseHTTPRequestHandler):
    protocol_version = "HTTP/1.1"

    def do_PUT(self):
        md5 = hashlib.md5()

        remaining = int(self.headers['Content-Length'])
        while True:
            data = self.rfile.read(min(remaining, 16384))
            remaining -= len(data)
            if not data or not remaining:
                break
            md5.update(data)
        print(md5.hexdigest())

        self.send_response(204)
        self.send_header('Connection', 'keep-alive')
        self.end_headers()


server = http.server.HTTPServer(('', 8000), RequestHandler)
server.serve_forever()

When I upload a file with curl, this works fine:

curl -vT /tmp/test http://localhost:8000/test

Because the file size is known upfront, curl will send a Content-Length: 5 header, so I can know how much should I read from the socket.

But if the file size is unknown, or the client decides to use chunked Transfer-Encoding, this approach fails.

It can be simulated with the following command:

curl -vT /tmp/test -H "Transfer-Encoding: chunked" http://localhost:8000/test

If I read from the self.rfile past of the chunk, it will wait forever and hang the client, until it breaks the TCP connection, where self.rfile.read will return an empty data, then it breaks out of the loop.

What would be needed to extend the above example to support chunked Transfer-Encoding as well?

j1elo · Answer 1 · 2020-11-22T17:57:11.680

As you can see in the description of Transfer-Encoding, a chunked transmission will have this shape:

chunk1_length\r\n
chunk1 (binary data)
\r\n
chunk2_length\r\n
chunk2 (binary data)
\r\n
0\r\n
\r\n

you just have to read one line, get the next chunk's size, and consume both the binary chunk and the followup newline.

This example would be able to handle requests either with Content-Length or Transfer-Encoding: chunked headers.

from http.server import HTTPServer, SimpleHTTPRequestHandler

PORT = 8080

class TestHTTPRequestHandler(SimpleHTTPRequestHandler):
    def do_PUT(self):
        self.send_response(200)
        self.end_headers()

        path = self.translate_path(self.path)

        if "Content-Length" in self.headers:
            content_length = int(self.headers["Content-Length"])
            body = self.rfile.read(content_length)
            with open(path, "wb") as out_file:
                out_file.write(body)
        elif "chunked" in self.headers.get("Transfer-Encoding", ""):
            with open(path, "wb") as out_file:
                while True:
                    line = self.rfile.readline().strip()
                    chunk_length = int(line, 16)

                    if chunk_length != 0:
                        chunk = self.rfile.read(chunk_length)
                        out_file.write(chunk)

                    # Each chunk is followed by an additional empty newline
                    # that we have to consume.
                    self.rfile.readline()

                    # Finally, a chunk size of 0 is an end indication
                    if chunk_length == 0:
                        break

httpd = HTTPServer(("", PORT), TestHTTPRequestHandler)

print("Serving at port:", httpd.server_port)
httpd.serve_forever()

Note I chose to inherit from SimpleHTTPRequestHandler instead of BaseHTTPRequestHandler, because then the method SimpleHTTPRequestHandler.translate_path() can be used to allow clients choosing the destination path (which can be useful or not, depending on the use case; my example was already written to use it).

You can test both operation modes with curl commands, as you mentioned:

# PUT with "Content-Length":
curl --upload-file "file.txt" \
  "http://127.0.0.1:8080/uploaded.txt"

# PUT with "Transfer-Encoding: chunked":
curl --upload-file "file.txt" -H "Transfer-Encoding: chunked" \
  "http://127.0.0.1:8080/uploaded.txt"

There's a slight bug in your chunk handling. Please add `self.rfile.readline()` after the `chunk_length == 0` check (the one before the break in your loop) because there are still '\r\n' bytes on the wire to end the chunk stream. If someone (like me) wants persistent connections, the next time the framework calls `handle_one_request`, it will read those two remaining bytes on the wire, think something is wrong, and close the connection. Thanks for the code though, it got me going in the right direction. — firebush, Nov 19 '20 at 04:35
Nice catch! I've modified the code in such a way that the trailing newline is always consumed, regardless of the size. I think this way is more clear how a size of 0 is actually an "end of transmission" indication, while the logic of the reader still stays the same for all chunks. — j1elo, Nov 22 '20 at 17:59

How to handle chunked encoding in Python BaseHTTPRequestHandler?

1 Answers1

Linked