1

I would like to rate-limit (but not block) various crawlers. Based on other examples of similar questions, I've come up with this:

http {
  ## 10MB zone, 5 reqs per sec
  limit_req_zone $limit zone=zone_a:10m rate=5r/s;

  geo $limited {
    default 0;
    157.55.39.31/32 1;
    157.55.39.39/32 1;
    157.55.39.40/32 1;
    157.55.39.166/32 1;
    157.55.39.167/32 1;
    157.55.39.212/32 1;
    157.55.39.213/32 1;
    68.180.229.48/32 1;
  }

  map $limited $limit {
    0 "false";
    1 "true";
  }
  ## - do something -
}

If I understand it correctly, a matched IP in the geo-block will set $limited to 1 and then the map-block will set $limit to true/false. I think what needs to happen next is an 'if' statement like this:

if ($limit = "true") {
  limit_rate zone=zone_a burst=10;
}

Is that the correct way to do this? Is there a downside to rate-limiting?

  • 1
    IMHO legitimate crawlers are relatively well-behaved and make up only a tiny fraction of daily traffic. If they do cause a performance impact you probably have a different problem altogether and should seriously consider that crawlers are only the symptom and not the cause... – HBruijn Feb 11 '15 at 23:02
  • We blocked the IPs I've listed at the firewall and saw CPU utilization drop ~40% on our servers. I don't want to completely thwart the crawlers, just not have them overrun my site. – Steve Harris Feb 11 '15 at 23:31
  • Perhaps fix your app so it doesn't dos itself when it receives some traffic? If you rate limit bing and yahoo - they'll just stop wasting their time trying to crawl your site (which if you send them enough errors, they might do of their own accord). You can [tell them](http://www.bing.com/webmaster/help/crawl-control-55a30302) [to slow down](http://help.yahoo.com/l/nl/yahoo/ysearch/slurp/slurp-03.html) anyway. – AD7six Feb 12 '15 at 09:31
  • I confess to being ignorant to the uses of robots.txt. Do all crawlers honor the settings? Also, to your point of fixing or modifying the app - that is not within my power, I can only work with the tools at hand. – Steve Harris Feb 12 '15 at 14:02

0 Answers0