I'm having all kinds of problems trying to restrict bad bots on my Apache 2.2 server, and am hoping somebody can assist.
I have banged my head on the wall for days trying to get this working, and used several different methods, but none seem to work properly.
I have several sites on one machine, and I could, of course, deny bad bots in individual .htaccess
files for each site - but that's a pain to maintain. So, I want to put the restrictions in httpd.conf
.
The first method I was using (which I thought was working) was to use a <Location "/">
section, e.g.
<Location "/">
SetEnvIfNoCase User-Agent "lwp-trivial" bad_bot
SetEnvIfNoCase User-Agent "libwww" bad_bot
SetEnvIfNoCase User-Agent "Wget" bad_bot
Deny from env=bad_bot
</Location>
However, I found that although this did block the bots, there was an issue because it then allows hidden files, such as .htaccess
and .htpasswd
to be served up, even though there is code in httpd.conf
to disallow it. I played around with the order of the <Files ...
block (which does the stuff blocking file access) and the <Location ...
block, but no matter which one had precedence it still allows hidden files to be served. If I take out the <Location ...
block then the server prevents the hidden files being served, as it should do.
I've also tried doing rewrites in httpd.conf
but that doesn't seem to work either (the block is at the foot of the files, but I've tried it above the virtual hosts section too), e.g.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AlphaBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteRule ^(.*)$ - [L,R=403]
</IfModule>
I get no errors using either method, but they're not doing what I want. This second method simply doesn't appear to block the bots.
I've also tried stuff like the following, also without success:
<Location "/var/www/sites/">
SetEnvIf User-Agent BLEXBot GoAway
Order allow,deny
Allow from all
Deny from env=GoAway
</Location>
... and
RewriteCond %{HTTP_USER_AGENT} "blexbot" [nocase]
RewriteRule ^.*$ – [forbidden,last]
... and seemingly every other combination of things which is possible. But I can still only block bots with individual .htaccess
files, or with the <Location "/">
section (which allows the revealing of hidden files).
As can be seen, one of the user-agent strings I'm testing with is "Blexbot" and variations of it, and so the last thing I've tried is with modsecurity.
However, I don't seem able to get that working properly either: here are a couple of examples which I've tried:
SecRule REQUEST_HEADERS:User-Agent "BLEXBot" "deny,status:403,id:5000218,msg:'Badbot test for Blexbot'"
SecRule REQUEST_HEADERS:User-Agent "@pmFromFile badbots.txt" "id:350001,rev:1,severity:2,log,msg:'BAD BOT - Detected and Blocked. '"
If I look in /var/log/modsec_audit.log
then I can see that modsecurity does identify the user-agent, and provides a log entry to that effect, but it doesn't actually prevent the pages from being served (which is kinda the whole point).
I do note that the modsec_audit.log
has entries of Engine-Mode: "DETECTION_ONLY"
, which might explain the pages still being served, but I'm not familiar with much of modsecurity at all, so I'm not really sure about what it's doing.
If anyone can assist it would be truly appreciated! I just need a single method to work, but I kind of like the idea of using modsecurity if I can do, as it seems I can just put any bad-bot entries in a single separate file.