0

I have a reasonably large Drupal site sitting being Varnish on a box with 4GB of memory. I had thought (hoped? assumed?) that standing Varnish up in front of Nginx/PHP-FPM would take the load off of PHP, but PHP still seems very active and I get more warnings that memory has crept up over 90% than I expected to get.

Am I fundamentally misunderstanding what Varnish does? My expectation was that it would cache the full HTTP response and just kick it out if it was cached and no request to Nginx, much less PHP, would ever get made as long as the requested URL was in cache. In talking to people who I'd expect to know more than I know on this topic, though, I'm getting information that would indicate otherwise.

Can anyone fill in these gaps? What should I expect from Varnish within the context of a Drupal site where the vast majority of traffic is anonymous and therefore subject to Varnish caching?

UPDATE

Upon request, here is my config (default.vcl - also in a gist):

backend default {
  .host = "127.0.0.1";
  .port = "8080";
}


acl purge {
    # Add local/internal server IPs
  "localhost";
  "127.0.0.1";
}


sub vcl_recv {
    if (req.http.User-Agent ~ "MS Search (5|6).0 Robot") {
    error 403 "Forbidden";
    }
    if (req.http.User-Agent ~ "Microsoft-WebDAV-MiniRedir") {
    error 403 "Forbidden";
    }

    remove req.http.ETAG;
    remove req.http.X-Generator;

    set req.grace = 30m;

    if (req.url ~"apc.php") {
    return (pass);
    }

    set req.http.host = regsuball(req.http.host, ":.*", "");

  if (req.request == "PURGE") {
    if (!client.ip ~ purge) {
      error 405 "Not allowed.";
    }
    return(lookup);
  }

  if (req.restarts == 0) {
    if (req.http.x-forwarded-for) {
      set req.http.X-Forwarded-For = req.http.X-Forwarded-For + ", " + client.ip;
    }
    else {
      set req.http.X-Forwarded-For = client.ip;
    }
  }

  #--Add Server Status Exclusion
  if (req.url ~ "/status") {
    if (client.ip ~ purge) {
        return(pass);
    }
    else {
        error 750 "http://"+req.http.host;
    }
  }

  #--Add Munin Exclusion
  if (req.url ~ "munin") {
    set req.backend = default;
    set req.http.X-Cache = "Default-Pass";
    return(pass);
  }

  # Handle compression correctly. Different browsers send different
  # "Accept-Encoding" headers, even though they mostly all support the same
  # compression mechanisms. By consolidating these compression headers into
  # a consistent format, we can reduce the size of the cache and get more hits.=
  # @see: http://varnish.projects.linpro.no/wiki/FAQ/Compression
  # Properly handle different encoding types
  if (req.http.Accept-Encoding) {
    if (req.url ~ "(?i)\.(bmp|bz2|gif|gz|ico|img|jpeg|jpg|lzma|mp3|ogg|png|swf|tbz|tga|tgz|wmf|zip)(\?.*|)$") {
        remove req.http.Accept-Encoding;
    }
    else if (req.http.Accept-Encoding ~ "gzip") {
        set req.http.Accept-Encoding = "gzip";
    }
    else if (req.http.Accept-Encoding ~ "deflate") {
        set req.http.Accept-Encoding = "deflate";
    }
    else {
        remove req.http.Accept-Encoding;
    }
  }
  if ( req.restarts > 0
       && req.url ~ "(?i)\.(bmp|bz2|css|gif|gz|ico|img|jpeg|jpg|js|lzma|mp3|ogg|png|swf|tbz|tga|tgz|txt|wmf|zip)(\?.*|)$"
     ) {
    return(lookup);
  }

  if ( req.restarts > 0
     || req.http.Content-Type ~ "multipart/form-data"
     || req.http.X-Requested-With == "XMLHttpRequest" #dont cache ajax requests
     || req.url ~ "nocache"
     #|| req.request == "POST" #never cache POST requests
     || req.url ~ "/(delete|add|edit|update)|render=media" #--Dont intercept "itok=" #-- || req.url ~ "itok="
     || ( req.http.Referer ~ "/(delete|add|edit|update)" )
     || req.url ~ "/(upload/profileimage|upload/bannerimage|docbldr/api/uploadimage2|db/item/save)"
     || req.http.Referer ~ "/(upload/profileimage|upload/bannerimage|docbldr/api/uploadimage2|db/item/save)"
  ) {
    return(pass);
  }

  # Allow the backend to serve up stale content if it is responding slowly.

  # Do not cache these paths.
  if (req.url ~ "^/status\.php$" ||
      req.url ~ "^/update\.php$" ||
      req.url ~ "^/ooyala/ping$" ||
      req.url ~ "^/admin/build/features" ||
      req.url ~ "^/info/.*$" ||
      req.url ~ "^/flag/.*$" ||
      req.url ~ "^.*/ajax/.*$" ||
      req.url ~ "^.*/ahah/.*$"
    ) {
      return (pass);
  }

  # Do not allow outside access to cron.php or install.php.
  if (req.url ~ "^/(cron|install)\.php$" && !client.ip ~ purge) {
    error 404 "Page not found.";
  }

  #--If POST requests don't include file uploads and the master is being overwhelmed by POSTS, uncomment below and comment out the POST line above
  if ( req.request == "POST"  #never cache POST requests
     || req.url ~ "(admin|login)"
     ) {
    return(pass);
  }

  ## always cache these images & static assets
  ## MUST OCCUR AFTER "req.restart" CHECK ABOVE!
  if (req.request ~ "GET|HEAD" && (req.url ~ "(?i)\.(bmp|bz2|css|gif|gz|ico|img|jpeg|jpg|js|lzma|mp3|ogg|png|swf|tbz|tga|tgz|txt|wmf|zip)(\?.*|)$"
  )) {
    remove req.http.cookie;
    return(lookup);
  }

  ### don't cache authenticated sessions
  if (req.http.Cookie && req.http.Cookie ~ "(PHPSESSID|^SESS)") {
    return(pass);
  }

  # DO cache this ajax request. # WordPress
  if(req.http.X-Requested-With == "XMLHttpRequest" && req.url ~ "recent_reviews") {
    return (lookup);
  }


  # Remove all cookies that Drupal doesn't need to know about. ANY remaining
  # cookie will cause the request to pass-through to Apache. For the most part
  # we always set the NO_CACHE cookie after any POST request, disabling the
  # Varnish cache temporarily. The session cookie allows all authenticated users
  # to pass through as long as they're logged in.
  if (req.http.Cookie) {
    set req.http.Cookie = ";" + req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
    set req.http.Cookie = regsuball(req.http.Cookie, ";(SESS[a-z0-9]+|NO_CACHE)=", "; \1=");
    set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

    if (req.http.Cookie == "") {
      unset req.http.Cookie;
      return(lookup);
    }
    else {
      return (pass);
    }
  }

    return (lookup);

}


sub vcl_hit {
  if (req.request == "PURGE") {
    purge;
    error 200 "Purged.";
  }
}


sub vcl_miss {
  if (req.request == "PURGE") {
    error 404 "Not in cache.";
  }

  if (req.url ~ "(?i)\.(bmp|bz2|css|gif|gz|ico|img|jpeg|jpg|js|lzma|mp3|ogg|png|swf|tbz|tga|tgz|txt|wmf|zip)(\?.*|)$") {
    unset req.http.cookie;
    set req.url = regsub(req.url, "\?.*", "");
  }
}


sub vcl_hash {

  hash_data(req.url);
  if (req.http.host) {
      hash_data(req.http.host);
  } else {
      hash_data(server.ip);
  }
  if (req.http.x-forwarded-proto) {
      hash_data(req.http.x-forwarded-proto);
  }
  return (hash);
}


sub vcl_fetch {

  set beresp.grace = 30m;

        remove beresp.http.ETAG;
        remove beresp.http.X-Generator;
        remove beresp.http.Link;
        remove beresp.http.Server;

  if (beresp.status == 404 && req.restarts == 0) {
    return(restart);
  }

  # Keep static content in Browser Cache and Varnish Cache for a while. Tweak as needed.
  if (req.url ~ "(?i)\.(bmp|bz2|css|gif|gz|ico|img|jpeg|jpg|js|lzma|mp3|ogg|png|swf|tbz|tga|tgz|txt|wmf|zip)(\?.*|)$") {
    #-- Prevent Varnish from caching unsuccessful static requests
    if (beresp.status != 200 && req.restarts == 0) {
      return(restart);
    }

    unset beresp.http.set-cookie;
    set beresp.http.cache-control = "max-age=3600; public";
    set beresp.ttl = 1800s;
  }

  set beresp.http.X-Host = req.http.host;
  set beresp.http.X-URL = req.url;

  # make Varnish compress content before storing it in cache and store text content for a while.
  if (beresp.http.content-type ~ "text") {
    set beresp.ttl = 1800s;
    set beresp.do_gzip = true;
  }
}


sub vcl_error {
  # Redirect to some other URL in the case of a homepage failure.
  if (obj.status == 750) {
    set obj.http.Location = obj.response;
    set obj.status = 302;
    return(deliver);
  }

  # Otherwise redirect to the homepage, which will likely be in the cache.
  set obj.http.Content-Type = "text/html; charset=utf-8";
  synthetic {"
<html>
<head>
  <title>Page Unavailable</title>
  <style>
    body { background: #303030; text-align: center; color: white; }
    #page { border: 1px solid #CCC; width: 500px; margin: 100px auto 0; padding: 30px; background: #323232; }
    a, a:link, a:visited { color: #CCC; }
    .error { color: #FFF;font-size:24px;padding:15px; }
  </style>
</head>
<body onload="setTimeout(function() { window.location = '"} + req.url + {"' }, 5000)">
  <div id="page">
  <h1 class="title">Page Unavailable</h1>
  <p>The page you requested is temporarily unavailable.</p>
  <p>We'll try again in 5 seconds.</p>
  <div class="error">(Error "} + obj.status + " " + obj.response + {")</div>
  </div>
</body>
</html>
"};
  return (deliver);
}


sub vcl_deliver {
}
Paul
  • 3,037
  • 6
  • 27
  • 40
Rob Wilkerson
  • 1,465
  • 4
  • 17
  • 25
  • 1
    Maybe. It depends on how you configured Varnish, and you didn't share that (you should have). – Michael Hampton Dec 30 '14 at 15:22
  • @MichaelHampton Thanks. Config is linked in an update to the original post. – Rob Wilkerson Dec 30 '14 at 15:42
  • Is it possible to get a link to the site in question? If not, are you using any modules which may be making ajax requests back to the server? Ads, statistics etc (A list of enabled modules might help)? What headers are set on the Drupal responses? Are they actually marked as being cacheable? Is the increased memory usage from PHP itself, or is it driven up by the amount of memory varnish uses? Are you using the Drupal varnish module? What are your settings on the "Performance" page? – Phizes Dec 30 '14 at 19:11

1 Answers1

3

It really depends on how your site is set up. Out of the box varnish will not cache a page if:

  1. It contains authentication headers
  2. It contains cookie headers
  3. Only GET and HEAD requests are cached
  4. There is a 120 second minimum TTL for caching so you may want to increase that

If you're using a PHP session then you may be coming unstuck with the second point.

I'm not aware of PHP itself sending cache control headers by default but maybe if you're using a some framework this is automatically done.

Implementing caching for something like images is pretty easy (so long as they're under a constant path) but dynamic content like PHP generated pages can take a little more thought in regards to implementation.

Edit: snip I see you have APC related code in your config so scratch the advice to implement a PHP accelerator as well!

Edit 2: Alright I was a bit short in that answer come to think of it. Using varnishncsa will help you see what pages are hitting or missing, so try running this:

varnishncsa -f -m 'TxHeader:X-Cache: MISS'

The -m switch matches on the same tags as what are marked for each transaction when you run varnishlog (grouped by their numbers).

Other useful commands to help you out are varnishstat (statistics, shows hit rate percentage), varnishtop (shows top tags in all requests) and varnishhist (histogram of transaction from client receive to server receive [from memory], pipe character is a cache hit hash character is a cache miss, scale is logarithmic with 1-e0 being 1 second).

yoshiwaan
  • 300
  • 2
  • 9
  • In addition to the above: Drupal can have many pitfalls which will prevent Varnish from caching pages, and there are settings in settings.php to control much of this behaviour in Drupal 7, for Drupal 6 the easiest path is to use Pressflow instead. (By default in Drupal 6, Varnish would not cache pages as there there is a cookie for even anonymous users.) There are many other ways to optimize a server towards handling Drupal,which can drastically reduce resource usage. (For example memcached, non-default MySQL configuration, and tweaking of PHP, and opcode parameters.) – Phizes Dec 31 '14 at 13:53