3

I've written a primitive HTTP server for testing my Emscripten apps. It serves static files from the current directory. The specifics is that I have large binary files (.data and .wasm), some of them rarely change. So it makes sense to have browser cache them indefinitely.

Chrome successfully sends If-None-Match for .html and .js files, but fails to do so for .data (in my case it's ~ 70 Mb). .html and .js files thus receive well 304, while .data gets 200 and the whole large file sent again, and this is slow-ish even on localhost.

How do I force Chrome to cache large binary files?

import os
import hashlib
import http.server

root = '.'

mime = {
    '.manifest': 'text/cache-manifest',
    '.html': 'text/html',
    '.png': 'image/png',
    '.jpg': 'image/jpg',
    '.svg': 'image/svg+xml',
    '.css': 'text/css',
    '.js': 'application/x-javascript',
    '.wasm': 'application/wasm',
    '.data': 'application/octet-stream',
}
mime_fallback = 'application/octet-stream'

def md5(file_path):
    hash = hashlib.md5()
    with open(file_path, 'rb') as f:
        hash.update(f.read())
    return hash.hexdigest()

cache = {os.path.join(root, f) : md5(f) for f in os.listdir() if any(map(f.endswith, mime)) and os.path.isfile(f)}

class EtagHandler(http.server.BaseHTTPRequestHandler):
    def do_GET(self, body = True):
        self.protocol_version = 'HTTP/1.1'
        self.path = os.path.join(root, self.path.lstrip('/') + ('index.html' if self.path == '/' else ''))

        if not os.path.exists(self.path) or not os.path.isfile(self.path):
            self.send_response(404)

        elif self.path not in cache or cache[self.path] != self.headers.get('If-None-Match'):
            content_type = ([content_type for ext, content_type in sorted(mime.items(), reverse = True) if self.path.endswith(ext)] + [mime_fallback])[0]
            with open(self.path, 'rb') as f:
                content = f.read()

            self.send_response(200)
            self.send_header('Content-Length', len(content))
            self.send_header('Content-Type', content_type)
            self.send_header('ETag', cache[self.path])
            self.end_headers()
            self.wfile.write(content)

        else:
            self.send_response(304)
            self.send_header('ETag', cache[self.path])
            self.end_headers()

if __name__ == '__main__':
    PORT = 8080
    print("serving at port", PORT)
    httpd = http.server.HTTPServer(("", PORT), EtagHandler)
    httpd.serve_forever()
Vadim Kantorov
  • 930
  • 1
  • 10
  • 28
  • I'm seeing this for ~40MB .data files. Firefox caches it with `last-modified` and `etag`, but Chrome won't. With the exact same headers, Chrome will cache a ~2MB .data file. – Andrew Olney Sep 13 '22 at 23:43
  • Last time I checked, Chrome still will cache ~30MB files, but this might have changed. Probably best to find this limit experimentally... – Vadim Kantorov Sep 15 '22 at 00:21

0 Answers0