Varnish crashes on bigger files

Question

Overall, I serve very small files. Think about images and small videos. Caching these with Varnish is like a breeze and doesn't give me any issues.

The problem I am having, is when I am downloading a 6 GB file. When doing so, I see the memory being used by Varnish to rise till the moment it crashes. Then it starts over till it crashes again.

I want to avoid Varnish from crashing
The download is therefor paused everytime and very slow. It should just download the 6 GB file. Period.

I already tried with file and RAM cache storage, but no different. I was capable of avoiding a crash, by setting the transient memory; DAEMON_OPTS="-s Transient=malloc,512m"

However, this only leads to the moment Varnish is using 512MB, after which it will crash again.

I've tried in vcl_backend_response, as a test case, both

  if (std.integer(beresp.http.Content-Length, 0) > 5242880) {
        set beresp.do_stream = true;
        return (deliver);
  }

and

  if (std.integer(beresp.http.Content-Length, 0) > 5242880) {
        set beresp.uncacheable = true;
        return (deliver);
  }

Neither of those however make sure that the file is nicely downloaded with my browser.

VarnishLog throws this error, but I guess it just means that memory got full and therefor crashed. FetchError Could not get storage

What am I missing here, to avoid the download from being halted? Is varnish somehow caching the file anyway?

Note: HAProxy is running in front of Varnish. Apache is the actual web server.

Thijs Feryn · Accepted Answer · 2020-08-30T07:57:47.807

Counters

Please have a look at your storage counters by using varnishstat.

These are the counters that will help you understand what's going on:

SMA.s0.g_space
SMA.s0.g_bytes
SMA.Transient.g_bytes

g_space let's you know the available space, and g_bytes is the number of bytes of space that is in use. SMA is your malloc storage, Transient refers to transient storage that is not part of your cache size.

Cache size

If you're processing objects that are 6GB in size, your -s malloc settings should at least by 6 GB in size, otherwise space cannot be allocated, and it will crash on you.

If your cache size is just barely bigger than 6 GB in this case, Varnish will constantly have to nuke objects from the cache to save space. Please make sure there's enough in there.

Shortlived objects, with a TTL of 2 minutes or less, will never end up there, and will occupy the transient storage

File stevedore

There is a file stevedore that will use disk to storage objects. This can be used if the total size of your cache exceeds the amount of memory you're willing to allocate to Varnish.

However, over time the file stevedore will slow you down, because it's not really optimized for that. It will suffer from disk fragmentation, and doesn't have a great performance.

Massive Storage Engine

To tackle these storage issues, Varnish Software created the Massive Storage Engine (MSE). It is capable of storing petabytes from data, and is written in such a way, that it doesn't suffer from fragmentation or delays.

Unfortunately, this is not an open source stevedore. It is part of the Varnish Enterprise offering, which requires a license. However, our official cloud images (on AWS, Azure, GCP & OCI), give you the opportunity to work with Varnish Enterprise without buying a license ahead of time.

Don't cache large files

Another option is to prevent large files from being cached all together.

Apparently, excluding large files based on their content length, will not work. Currently, the only way to make sure no object storage memory is consumed for huge files is by calling return(pipe) in vcl_recv.

This is not an ideal solution, because you should know ahead of time, based on the incoming request that the response is going to be huge.

return(pipe) is a mechanism in Varnish to bypass the cache, but also to go out of HTTP mode and go into TCP mode. This is typically used for cases where an incoming request doesn't look like HTTP.

Thijs, thank you for your response. Unfortunately I also sometimes have files up to 60 GB that need to be downloaded, however these are rare (as is the 6 GB one). So I was hoping that one way or another, I could let those files pass by Varnish without "touching" them. According to your post, my only real solution is to either get a Varnish license, try to work with stevedore (with bad performance) or drop Varnish completely as a caching solution.... — P.T., Aug 27 '20 at 14:07
@P.T. I updated my answer, and added a part about preventing large files from being stored in cache. That being said, a lot of our clients who host video, or who create their own CDN do this using the *Massive Storage Engine*. It is worth considering. You can get free trials in the Cloud if you're interested. — Thijs Feryn, Aug 28 '20 at 15:16
I've tried that solution, but unfortunately Varnish will still crash on memory when the file is being passed through if there is not enough memory. So in the case of the 60 GB file, I probably need 60 GB of memory in order to be able to let it pass along. — P.T., Aug 29 '20 at 08:03
@P.T. I talked to our R&D team, and apparently `set beresp.uncacheable = true` doesn't prevent the object from being (temporarily) stored in memory. The only way it really works is by executing `return(pipe)` in `vcl_recv`, which is not ideal. I updated my answer to elaborate on this. — Thijs Feryn, Aug 30 '20 at 07:59
Thanks Thijs. That at least resolves the issue for now for me which allows me to look for a long term solution. — P.T., Aug 31 '20 at 06:59