Non-transparent HTTPS cache with last-access expiry

Question

I'd like to set up a cache server for downloaded files. One twist is that I want it to work with HTTPS (including redirects from HTTP to HTTPS). I understand the usual problems with this, but the difference for me is that this does not need to be a transparent proxy. For example:

# Usually you'd do something like this:
curl --proxy myserver:8080 https://example.com/file.tar.gz

# But it's fine for our scripts to call something like this instead:
curl myserver:8080 --data-raw https://example.com/file.tar.gz

Note that here the client is specifically directing its request at myserver, so it's not going to try and verify that the reponse comes from example.com. (My server should though!)

The other twist is this will only be used for files that never change (the URLs include the version number) so the usual stuff about cache freshness doesn't apply. If the file (or redirect response) is cached then it should be returned without checking the internet at all. The cached copy should be deleted some fixed period after it is last requested, regardless of when first downloaded at our end.

Question: I had hoped to use an HTTP proxy like Squid but I can't see how to configure it to do anything like this. Alternatively, writing a bit of code is an option but I'd prefer to avoid that. What could I do to establish a cache like this?

Background: This is to be used mostly for third-party libraries we'll use in our source code, when building Docker images and when developers are building outside of containers. Sometimes we currently check in third-party code to our own repos but this isn't ideal. I'm sure we're not the only people facing this problem but I can't find a good solution on the web ... maybe I'm just missing the right search term.

score 2 · Answer 1 · answered Apr 03 '17 at 23:05

This is possible, and requires three configuration changes:

Configure SSL Bump to perform man-in-the-middle decryption, in order to make the content available for caching
Use Squid's refresh_pattern parameter to make sure Squid caches (& holds onto) the objects you want to store
Adjust the maximum_object_size parameter to be at least as large as the maximum file you plan to download.

You should be aware that after configuring SSL bump, Squid will create & issue a self-signed cert for every domain it is configured to bump. Your client must accept this certificate in order for the transfer to take place. cURL can do this with the --cacert parameter (allowing it to accept CAs whose public key appears in the list) or the -k parameter (disables SSL checking completely).

Also, there's a subtle (but important) difference in the way the two cURL commands you posted above work. The first will cause cURL to open a connection as though it were speaking to a proxy, ie, open port, issue a "CONNECT " statement, then wait for the SSL handshake to take place, then start talking HTTP. The second invocation will cause cURL to open a connection and attempt the SSL handshake straight away. In that case, the host (running Squid) must figure out which host to connect to, usually from the SNI part of the handshake. Squid can do this, but you need to configure it with the https_port transparent statement. I would recommend doing it via the first method if you can, as it requires less configuration on the Squid side, and makes it clear on the client side that a proxy is involved.

The point of my question was that I really hoped to avoid SSL bump. Notice that in the second example curl (which is the style I'd prefer), the target server is just specified as POST data, and the connection to our server isn't even HTTPS. (Actually it would be better if it were, but I deliberately didn't use it to stress that this doesn't need to be proper HTTP proxying.) — Arthur Tacca, Apr 08 '17 at 11:02
The message I'm getting from your answer is that what I want isn't possible from HTTP proxies like Squid. If I get time to continue this project I'll play with Apache, especially mod_rewrite and mod_cache, and post a new question if I run into trouble, and I'll post an update back here. — Arthur Tacca, Apr 08 '17 at 11:03
If you're not prepared to let Squid look inside the SSL tunnel, then you need to cache the entire HTTPS payload, which is pointless because it changes every time (new keys are exchanged, and encrypt the payload differently, even for the same underlying object). If you want to cache, you need to look at the http, which is only possible if you decrypt/re-encrypt. Sorry! — djluko, Apr 09 '17 at 21:49
If you were prepared to use URLs like `http://example.com/file.tar.gz` then you could do it with Squid's url_rewrite parameter. If that's possible, re-word your question and I'll have another go at answering it. — djluko, Apr 09 '17 at 21:52

Non-transparent HTTPS cache with last-access expiry

1 Answers1