7

I have just set up a server with just varnish installed in front of my backend server, where I have two different django sites, being served through nginx+gunicorn

It seem to work, but I get Header Age = 0, and looking at the documentation, that's not very good.

I want to cache pages for anonymous users, but not for authenticated users or if a user have a cookie called "AUTHENTICATION"

Here is my default.vcl

backend django {
    .host = "backend1";
    .port = "8080";
}


sub vcl_recv {  

  # unless sessionid/csrftoken is in the request, don't pass ANY cookies (referral_source, utm, etc)  
  if (req.request == "GET" && (req.url ~ "^/static" || (req.http.cookie !~ "sessionid" && req.http.cookie !~ "csrftoken" && req.http.cookie !~ "AUTHENTICATION"))) {  
    remove req.http.Cookie;  
  }  


    #normalize accept-encoding to account for different browsers  
    if (req.http.Accept-Encoding) {
        if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
            # No point in compressing these
            remove req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unknown algorithm
            remove req.http.Accept-Encoding;
        }
    }  



}  

sub vcl_fetch {  

  # /static and /media files always cached  
  if (req.url ~ "^/static" || req.url ~ "^/media") {  
       unset beresp.http.set-cookie;  
       return (deliver);  
  }  

  # pass through for anything with a session/csrftoken set  
  if (beresp.http.set-cookie ~ "sessionid" || beresp.http.set-cookie ~ "csrftoken" || beresp.http.set-cookie ~ "AUTHENTICATION") {  
    return (hit_for_pass);  
  } else {  
    return (deliver);  
  }  

} 

Could it be that sessionid is set for every user, even if they are not logged in, and that is preventing Varnish effectively cache pages for anon users?

Edit:

Using isvarnishworking.com this is the output:

HTTP/1.1 200 OK
Server: cloudflare-nginx
Date:   Fri, 15 Nov 2013 09:30:20 GMT
Content-Type:   text/html; charset=utf-8
Connection: keep-alive
Set-Cookie: __cfduid=d281023a84b2e5351d109c1848eeca1601384507820317; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.mydomain.com; HttpOnly
Vary:   Cookie
X-Frame-Options:    SAMEORIGIN
X-Varnish:  1602772074
Age:    0
Via:    1.1 varnish
CF-RAY: cdaec14fab00412
Content-Encoding:   gzip

Edit 2:

My new default.vcl:

backend django {
    .host = "backend1";
    .port = "8080";
}

sub vcl_recv {  

    #normalize accept-encoding to account for different browsers  
    if (req.http.Accept-Encoding) {
        if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
            # No point in compressing these
            remove req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unknown algorithm
            remove req.http.Accept-Encoding;
        }
    }  
}



sub vcl_fetch {  
  if (req.url ~ "^/static" || req.url ~ "^/media") {  
    unset beresp.http.set-cookie;  
  }  

  if (beresp.http.set-cookie !~ "sessionid" && beresp.http.set-cookie !~ "csrftoken" && beresp.http.set-cookie !~ "AUTHENTICATION") {  
    unset beresp.http.set-cookie; 
  }
} 

Result from isvarnishworking.com

HTTP/1.1 200 OK
Server: cloudflare-nginx
Date:   Fri, 15 Nov 2013 12:08:42 GMT
Content-Type:   text/html; charset=utf-8
Connection: keep-alive
Set-Cookie: __cfduid=d55ea1b56e978cbbf3384d0fa2f21571e1384517322491; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.mydomain.com; HttpOnly
Vary:   Cookie
X-Frame-Options:    SAMEORIGIN
X-Varnish:  1240916568
Age:    0
Via:    1.1 varnish
CF-RAY: cdbd4119f3b0412
Content-Encoding:   gzip

Edit 3:

backend default {
    .host = "backend1";
    .port = "8080";
}

sub vcl_recv {  

  # unless sessionid/csrftoken is in the request, don't pass ANY cookies (referral_source, utm, etc)  
  if (req.request == "GET" && (req.url ~ "^/static" || (req.http.cookie !~ "sessionid" && req.http.cookie !~ "csrftoken" && req.http.cookie !~ "AUTHENTICATION"))) {  
    remove req.http.Cookie;  
  }  

    #normalize accept-encoding to account for different browsers  
    if (req.http.Accept-Encoding) {
        if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
            # No point in compressing these
            remove req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unknown algorithm
            remove req.http.Accept-Encoding;
        }
    }  

}  

sub vcl_fetch {  

  # /static and /media files always cached  
  if (req.url ~ "^/static" || req.url ~ "^/media") {  
       unset beresp.http.set-cookie; 
  }

  if (beresp.http.set-cookie !~ "sessionid" && beresp.http.set-cookie !~ "csrftoken" && beresp.http.set-cookie !~ "AUTHENTICATION") {  
    unset beresp.http.set-cookie;
  }

} 

My backend response (without varnish in front) is:

GET / HTTP/1.1
Host: www.mydomain.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: nb-no,nb;q=0.9,no-no;q=0.8,no;q=0.6,nn-no;q=0.5,nn;q=0.4,en-us;q=0.3,en;q=0.1
Accept-Encoding: gzip, deflate
Cookie: __cfduid=d8f496aef561efd7a30c3d9f909a02cf31384507505064; sessionid=twoq45r21gn341545ohubilyp739r42ee; _ga=GA1.2.382479980.1384507508
Connection: keep-alive

HTTP/1.1 200 OK
Server: cloudflare-nginx
Date: Fri, 15 Nov 2013 14:37:53 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Language, Cookie
X-Frame-Options: SAMEORIGIN
Content-Language: nb
CF-RAY: cdcae94f68105af
Content-Encoding: gzip
Tomas Jacobsen
  • 2,368
  • 6
  • 37
  • 81

2 Answers2

4

Could it be that sessionid is set for every user, even if they are not logged in, and that is preventing Varnish effectively cache pages for anon users?

You are correct. After logout a new session is immediately started and a new session cookie is planted on user's machine. To work around this problem I created a custom logout view that I use with sites I use with Varnish:

from django.conf import settings
from django.contrib.auth.views import logout

def logout_user(request):
    """After logging out some of the cookies should be deleted,
    allowing upstream cache to work effectively."""
    response = logout(request)
    request.session.modified = False  # forces session middleware not to set its own cookie
    response.delete_cookie(settings.CSRF_COOKIE_NAME)
    response.delete_cookie(settings.SESSION_COOKIE_NAME)
    return response

As you can see I force session middleware not to set a new cookie, and then I delete the old cookies (I also get rid of the csrf cookie).

Edit: Also, this code seems completely unnecessary, as Varnish do this automatically for any cookie being set:

  # pass through for anything with a session/csrftoken set  
  if (beresp.http.set-cookie ~ "sessionid" || beresp.http.set-cookie ~ "csrftoken" || beresp.http.set-cookie ~ "AUTHENTICATION") {  
    return (hit_for_pass);  
  } else {  
    return (deliver);  
  }  

Also note that hit_for_pass will make a particular URL not cachable for a couple of minutes (for all users!). Try those three diagnostics:

  1. Clear cookies, remove the code above, restart Varnish, check if Age is still set to 0.
  2. Check headers coming from your backend (nginx). Maybe it is setting the Age value itself, or forcing Varnish to do so using other cache control cookies?
  3. Use varlog to check if those responses are being cached.

Edit 2: Your isvarnishworking.com output shows that the servers sets a cookie called __cfduid. Every time cookie is set Varnish automatically enters hit-for-pass mode (see the code I linked to in the edit above). That is most likely the reason of the problem. I guess that was the reason for the code I deemed unnecessary. I'd try explicitly removing all unknown cookies:

sub vcl_fetch {  
  if (req.url ~ "^/static" || req.url ~ "^/media") {  
    unset beresp.http.set-cookie;  
  }  

  if (beresp.http.set-cookie !~ "sessionid" && beresp.http.set-cookie !~ "csrftoken" && beresp.http.set-cookie !~ "AUTHENTICATION") {  
    unset beresp.http.set-cookie; 
  }
} 
Ludwik Trammer
  • 24,602
  • 6
  • 66
  • 90
  • This seems very important to have, but how about the users/vistors that is visiting the site for the very first time? It is those people/users I want to cache the site for. – Tomas Jacobsen Nov 15 '13 at 09:06
  • New users should not have `sessionid` set, unless you are using their session in some way. Try making an experiment - clearing your cookies and checking if your site sets sessionid without loggin in. – Ludwik Trammer Nov 15 '13 at 09:14
  • You are correct! sessionid is not set for new users, but why is varnish setting age to 0? I made an edit to my question, to include the output from isvarnishworking.com – Tomas Jacobsen Nov 15 '13 at 09:35
  • First of all, what is your `VARNISH_TTL` value in `/etc/sysconfig/varnish`? – Ludwik Trammer Nov 15 '13 at 09:37
  • It is using the default value 120 – Tomas Jacobsen Nov 15 '13 at 09:45
  • Could you show your response headers without Varnish then? Maybe this value is set ever before Varnish... – Ludwik Trammer Nov 15 '13 at 09:47
  • Ok, I read your `default.vcl` one more time, more carefully this time and I think I know what the problem is. I'll update my answer in a moment. – Ludwik Trammer Nov 15 '13 at 09:56
  • Hmm, does my default.vcl look correct now? Still getting Age:0 on the header. – Tomas Jacobsen Nov 15 '13 at 12:13
  • Your previous `vcl_recv` was better than the new one. Have you tried the rest of the things I proposed in my first edit? – Ludwik Trammer Nov 15 '13 at 13:01
  • Im sorry, it's not clear how the vcl file should look like. Should I remove something from the original, or should I add something? I have checked my backed response, and it is not setting any age on the response, so I think it is something wrong with my vcl file. – Tomas Jacobsen Nov 15 '13 at 14:30
  • I'd like to see vcl file with `vcl_recv` from your original version and `vcl_fetch` from your new version. There is also no `Cache-Control` or other cache related headers on your back-end repose? – Ludwik Trammer Nov 15 '13 at 14:33
  • Updated with new vcl, and the response without varnish in front of the backend. – Tomas Jacobsen Nov 15 '13 at 14:44
  • I'm not sure I've got more ideas. Do URLs within /static/ have the same problem? – Ludwik Trammer Nov 15 '13 at 14:52
  • Okey. Yes. /static/ has the same age:0 :/ Thank you for your help! – Tomas Jacobsen Nov 15 '13 at 14:56
  • Oh, and please note that what you pasted as your "backend response" includes the `sessionid` cookie, so for this particular response Varnish actually **should** set Age to 0. When testing always make sure you don't send cookies. – Ludwik Trammer Nov 15 '13 at 15:02
  • Strange. I was under the impression that django only set sessionid if the user was authenticated, but if I blitz.io the domain, I see the response is setting a sessionid. – Tomas Jacobsen Nov 15 '13 at 15:17
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/41269/discussion-between-ludwik-trammer-and-garreth-00) – Ludwik Trammer Nov 15 '13 at 15:34
0

Varnish out of the box is setup for broken application servers that don't send a Vary: cookie header. Django is smart and will send a Vary: cookie header if you do anything with the session id. So the best thing to do is to edit your vcl and remove anything to do with cookies. Varnish will handle the vary on it's own.

Here is the default

sub vcl_recv {
    if (req.restarts == 0) {
        if (req.http.x-forwarded-for) {
            set req.http.X-Forwarded-For =
                req.http.X-Forwarded-For + ", " + client.ip;
        } else {
            set req.http.X-Forwarded-For = client.ip;
        }
    }
    if (req.request != "GET" &&
      req.request != "HEAD" &&
      req.request != "PUT" &&
      req.request != "POST" &&
      req.request != "TRACE" &&
      req.request != "OPTIONS" &&
      req.request != "DELETE") {
        /* Non-RFC2616 or CONNECT which is weird. */
        return (pipe);
    }
    if (req.request != "GET" && req.request != "HEAD") {
        /* We only deal with GET and HEAD by default */
        return (pass);
    }
    /// REMOVE THIS SECTION
    if (req.http.Authorization || req.http.Cookie) {
        /* Not cacheable by default */
        return (pass);
    }
    /// ENDREMOVE SECTION
    return (lookup);
}

Also in you django code, both python and templates, make sure to not EVER check/use request.user or session. If you do, that means the page behaves different for different users and so should be cached differently (and that's why django will send a Vary: cookie header, which means to cache this differently based on cookies, i.e. sessionid)

dalore
  • 5,594
  • 1
  • 36
  • 38
  • I'm still trying to figure this all out. I'm just using cloudflare, not varnish, but I assume the principle is the same. I'm looking at my request and responses logged in and logged out. The only difference I see is the Session ID cookie. This tells me that Cloudflare would have no way of differentiating logged in from logged out. As far as I can tell, it would cache everybody separately, since it would key everyone on their session ID cookie. What am I missing here? – orblivion Mar 13 '15 at 00:08