I need to block a bunch of robots from crawling a few hundred sites hosted on a Nginx web server running on a Ubuntu 16.04 machine.
I've found a fairly simple example here (important part of the code is below) but it seems that this functionality is available only inside of a server block (because of the IF statement) and I think this is a horrible idea, especially with a large number of sites on the machine.
if ($http_user_agent ~* (ahrefs|wget|crawler|majestic) ) {
return 403;
}
So, the question is can something similar be achieved from the main nginx.conf
file and work for all the domains currently defined in the sites-enabled
folder and the ones added in the future?
I've also read about the map approach and found a whole project on GitHub that uses it - https://github.com/mariusv/nginx-badbot-blocker but it still requires editing of all the files in the sites-enabled
folder and this will take too much time for a few hundred already up and running sites.