First, I would like to note that despite the seeming ease of HTTP caused by its enormous wide spread and presence of good client libraries for each language, HTTP is in fact a complex protocol which involves multiple interacting tiers. Caching is no exception, RFC 2616 Section 13. The following is said about Last-Modified
/If-Modified-Since
, because ETag
ing with gzip is another story.
Setup
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import cherrypy
path = os.path.abspath(os.path.dirname(__file__))
config = {
'global' : {
'server.socket_host' : '127.0.0.1',
'server.socket_port' : 8080,
'server.thread_pool' : 8
},
'/static' : {
'tools.gzip.on' : True,
'tools.staticdir.on' : True,
'tools.staticdir.dir' : os.path.join(path, 'static')
}
}
if __name__ == '__main__':
cherrypy.quickstart(config = config)
Then put some plain text or HTML file in static
directory.
Experiment
Firefox and Chromium don't send cache related headers on first request, i.e. GET /static/some.html:
Accept-Encoding: gzip, deflate
Host: 127.0.0.1:8080
Response:
Accept-Ranges: bytes
Content-Encoding: gzip
Content-Length: 50950
Content-Type: text/html
Date: Mon, 15 Dec 2014 12:32:40 GMT
Last-Modified: Wed, 22 Jan 2014 09:22:27 GMT
Server: CherryPy/3.6.0
Vary: Accept-Encoding
On subsequent requests, any networking is avoided with the following cache info (Firebug):
Data Size: 50950
Device: disk
Expires: Sat Jan 17 2015 05:39:41 GMT
Fetch Count: 6
Last Fetched: Mon Dec 15 2014 13:19:45 GMT
Last Modified: Mon Dec 15 2014 13:19:44 GMT
Because by default CherryPy doesn't provides expiration time (Expires
or Cache-Control
), Firefox (likely Chromium too) uses the heuristic according to RFC 2616 Section 13.2.4:
If none of Expires, Cache-Control: max-age, or Cache-Control: s- maxage (see section
14.9.3) appears in the response, and the response does not include other restrictions
on caching, the cache MAY compute a freshness lifetime using a heuristic...
Also, if the response does have a Last-Modified time, the heuristic expiration value
SHOULD be no more than some fraction of the interval since that time. A typical setting
of this fraction might be 10%.
Here's the code to prove heuristic nature of the Expires
value:
import email.utils
import datetime
s = 'Wed, 22 Jan 2014 09:22:27 GMT'
lm = datetime.datetime(*email.utils.parsedate(s)[0:6])
print datetime.datetime.utcnow() + (datetime.datetime.utcnow() - lm) / 10
When you refresh the page a browser appends Cache-Control
to the request:
Accept-Encoding: gzip,deflate
Cache-Control: max-age=0
Host: 127.0.0.1:8080
If-Modified-Since: Wed, 22 Jan 2014 09:22:27 GMT
CherryPy replies 304 Not Modified if the file hasn't been changed. Here's how it works:
cherrypy.lib.static.serve_file
def serve_file(path, content_type=None, disposition=None, name=None, debug=False):
# ...
try:
st = os.stat(path)
except OSError:
if debug:
cherrypy.log('os.stat(%r) failed' % path, 'TOOLS.STATIC')
raise cherrypy.NotFound()
# ...
# Set the Last-Modified response header, so that
# modified-since validation code can work.
response.headers['Last-Modified'] = httputil.HTTPDate(st.st_mtime)
cptools.validate_since()
# ...
cherrypy.lib.cptools.validate_since
def validate_since():
"""Validate the current Last-Modified against If-Modified-Since headers.
If no code has set the Last-Modified response header, then no validation
will be performed.
"""
response = cherrypy.serving.response
lastmod = response.headers.get('Last-Modified')
if lastmod:
# ...
since = request.headers.get('If-Modified-Since')
if since and since == lastmod:
if (status >= 200 and status <= 299) or status == 304:
if request.method in ("GET", "HEAD"):
raise cherrypy.HTTPRedirect([], 304)
else:
raise cherrypy.HTTPError(412)
Wrap up
Using tools.staticdir
CherryPy doesn't send file contents, neither gzips them, for requests that come with valid If-Modified-Since
header, but only responding with 304 Not Modified asking filesystem for modification time. Without page refreshing it won't even receive a request because browsers use the heuristic for expiration time when the server doesn't provide one. Of course making your configuration more deterministic, providing cache time-to-live won't hurt, like:
'/static' : {
'tools.gzip.on' : True,
'tools.staticdir.on' : True,
'tools.staticdir.dir' : os.path.join(path, 'static'),
'tools.expires.on' : True,
'tools.expires.secs' : 3600 # expire in an hour
}