1

There are many bots out there. Majestic bot is one of them. It sometimes crawls the website's pages by force, i.e. the server sometimes sends 100 requests during 1 second. Majestic bot is just an example. I don't want to block a certain bot, there are many bots and trying to detect them is waste of time.

My question is: How can I rate-limit HTTP requests that are sent by a certain bot to a Nginx server? As an example just 10 requests during 1 second is allowed for an IP address. And does this operation consume noticeable resources (As the IP address should be checked and stored in somewhere)?

Siah
  • 113
  • 5
  • Thanks for the question, it saved me time to ask the same thing myself! There seems to be plenty of information around on how to completely disable bot access, but that's not what we want: we want as many bots to crawl our websites, but _not_ have the bots hogging up all resources so that nobody else can view the site... I have a few sites where 70-100% of the average daily traffic is _only_ bots. A waste of CPU and bandwidth! But of course I want them to have a fair chance to crawl through all those pages... – Gwyneth Llewelyn Dec 02 '20 at 19:35

1 Answers1

2

Any attempt at elaboration on my part beyond referring to the following is pointless:

http://nginx.org/en/docs/http/ngx_http_limit_req_module.html

Izaac
  • 56
  • 3
  • 2
    Well, I was reading the same page. The answer is RTFM! I'll accept this answer. – Siah Jan 24 '17 at 21:51
  • Piping in late, but for an example on how to beautifully integrate `nginx`'s rate-limiting abilities with `fail2ban` to make sure that repeating trespassers are automatically locked out, see https://www.webfoobar.com/node/36 – Gwyneth Llewelyn Dec 02 '20 at 19:32