Scope and utility of Varnish bans

Question

I need to invalidate the cache of a set of URLs related to a resource that was either deleted/unpublished, or updated. E.g if my resource UUID is 1234abcd I want to invalidate all its cached derivatives under /resource/1234abcd/*.

I understand that there is no way in Varnish to do this with purge, but it can be done with bans. However I have a hard time understanding how bans exactly work.

If e.g. I update resource 1234abcd and ban all its derivatives, I assume that the next client request to /resource/1234abcd/derv1 will be a new backend fetch, and that new resource will be cached. Will I have two version of the same derivative, one banned and one not (until the old one expires and eventually its ban expires too)? If my resources have a long expiration date I may accumulate a lot of bans for cache resources that I would much rather have cleared right away.

On a software design level, What is the utility of leaving inaccessible resources around instead of implementing a regex-based purge, which seems more straightforward to manage?

Also, I implemented a ban in my development env, and I see that a ban only takes effect a minute or so after being requested. Does this have to do with the ban lurker or some other timing setting?

Thanks.

IIRC, bans are history. Since you can pass a regular expression to purge, I'm afraid you understood wrongly that it couldn't be done. About "What is the utility of leaving inaccessible resources ...", it costs less enery. When I was working with Varnish, there was no way to list the contents of the cache, not even for Varnish itself. — Gerard H. Pille, Aug 07 '20 at 18:37
That's groundbreaking news. Could you point me to an example of using patterns with purge please? I'm looking at the morecomplex examples in https://varnish-cache.org/docs/trunk/users-guide/purging.html and I can't see how to do that. — user3758232, Aug 07 '20 at 19:51
Fake news often is groundbreaking. A good thing I started my comment with "IIRC", because I had it as wrong as could be. Scusi! Allthough I was a very intensive user of Varnish, I've never noticed the bans needing time to register. — Gerard H. Pille, Aug 09 '20 at 09:07

score 3 · Accepted Answer · answered Aug 10 '20 at 07:53

Bans & the ban list

Bans in Varnish are done based on the so-called ban list. Items on the ban list match specific criteria that ideally match properties of a cached object.

The ban lurker, a separate thread that monitors the ban list is responsible for removing the matching objects.

Bans & regex url matches

In most cases, you'll want to match a URL pattern that needs to be invalidated.

An easy way would be to issue the following ban:

req.http.host == example.com && req.url ~ /resource/1234abcd/.*

The problem with this example is the request scope: the ban lurker only has access to the object and its properties. Request information is not part of that, because an object only contains response information.

In this case, the lurker won't match the item on the ban list, and the item will remain in cache until the next user hits a matching URL. This is not efficient.

Lurker-friendly bans

A trick we use to bypass these scoping limitations, is by adding host & url information to the response.

Here's how to do this:

sub vcl_backend_response {
  set beresp.http.url = bereq.url;
  set beresp.http.host = bereq.http.host;
}

sub vcl_deliver {
  unset resp.http.url;
  unset resp.http.host;
}

You could then run the following ban:

obj.http.host == example.com && obj.http.url ~ /resource/1234abcd/.*

The lurker would be able to match these properties, and would remote the matching objects from cache, without the need for a request to happen.

When does the lurker remove the objects from cache?

The varnishd binary has a couple of runtime settings that influence how bans are handled by the ban lurker:

ban_lurker_age: the ban lurker will ignore bans until they are this old
ban_lurker_sleep: how long the ban lurker sleeps after examining a batch of ban list items
ban_lurker_batch: the number of bans the lurker processes before going back to sleep

How to issue bans

There are 3 ways you can issue a ban:

Via an HTTP call that is defined in VCL
Via the varnishadm binary locally
Via a remote CLI call over TCP/IP

Here's a varnishadm example:

varnishadm ban obj.http.host == example.com '&&' obj.http.url '~' '\\.png$'

For more information about a remote CLI call, please have a look at: http://varnish-cache.org/docs/6.0/reference/varnish-cli.html#varnish-command-line-interface

Here's an HTTP example:

acl purge {
    "localhost";
    "192.168.55.0"/24;
}

sub vcl_recv {
    if (req.method == "PURGE") {
        if (!client.ip ~ purge) {
            return(synth(405,"Not allowed."));
        }
        if(req.http.x-purge-regex) {
            ban("obj.http.host == " + req.http.host +" && obj.http.url ~ " + req.http.x-purge-regex);
            return(synth(200, "Purged."));
        }
        return (purge);
    }
}

In this HTTP example, we combine purging and banning.

A regular purge issued by curl would look like this:

curl -XPURGE http://example.com/resource/1234abcd/abc

A more flexible regex purge using bans, would be issued like this:

curl -XPURGE -H"x-purge-regex: /resource/1234abcd/.*" http://example.com

Issues with bans

Banning is not without issues.

The entire concept of bans revolves around matching patterns in a list with objects stored in cache.

The more items on the list, the more CPU cycles are required to have them all processed
The more objects in the cache, the more CPU cycles are required to have them all processed

So big ban lists, and lots of objects, could cause a lot of CPU overhead

Tags over URLs

Another use case for bans is tag-based invalidation.

Sometimes its quite hard to translate an entity change into matching URLs. Sometimes you don't even know which URLs are affected by an entity change.

In that case it makes more sense, to tag content, and to invalidate objects that match one of these tags.

In your application, you would issue response headers like this:

X-Cache-Tags: type:resource, id:1234abcd, category:product

If you then want to remove all resources that are part of the product category, you could simply issue the following ban:

ban obj.http.x-cache-tags ~ category:product

A better solution for tag-based invalidation

If you're planning to use bans for tag-based invalidation, you'll run into the same CPU issues if your ban list grows to quickly, and you have too many objects in cache that need to be validated.

A better solution is the use of xkey, which is a Varnish module that comes with the Varnish modules collection. See https://github.com/varnish/varnish-modules/blob/master/src/vmod_xkey.vcc for more info.

You have to compile this module, but the API is more flexible, and the performance is a lot better.

Thanks for the very exhaustive answer. I would be issuing quite a few bans on medium-sized hierarchies (hundreds of derivatives) based on individual IDs, so tags may not work for me. But your ban lurker explanation is very useful. Does that mean that if I have stale derivatives with very long TTL that are rarely accessed, the ban lurker will remove them quite early, and eventually remove the ban too? — user3758232, Aug 10 '20 at 17:10
@user3758232 the ban lurker only removes items that are explicitly added to the ban list. What you're referring to is the *LRU eviction mechanism* that kicks in when the cache is full. When the cache is full and space needs to be freed to store new objects, the *LRU eviction mechanism* will remove items with the least amount of hits. — Thijs Feryn, Aug 11 '20 at 08:54
@user3758232 stale objects can be revalidated asynchronously, when a new request for that resource comes in. As long as the *grace time* hasn't elapsed, async revalidation will happen and meanwhile stale content is served. The default grace is *10 seconds*, and can be set in *VCL* via `beresp.grace`, or through `Cache-Control: stale-while-revalidate=10`. — Thijs Feryn, Aug 11 '20 at 08:58
Yes, I'm already using a LRU strategy to maintain a fixed-size cache where objects almost never expire because they almost never change. They get either invalidated or pushed out by ones in higher demand. That is why it's important for me to flush stale objects relatively quickly, so good ones don't get evicted too soon. I think that the timings you are talking about would work well for me. Of course I'll have to verify with live traffic... — user3758232, Aug 12 '20 at 17:06