3

I'm having all kinds of problems trying to restrict bad bots on my Apache 2.2 server, and am hoping somebody can assist.

I have banged my head on the wall for days trying to get this working, and used several different methods, but none seem to work properly.

I have several sites on one machine, and I could, of course, deny bad bots in individual .htaccess files for each site - but that's a pain to maintain. So, I want to put the restrictions in httpd.conf.

The first method I was using (which I thought was working) was to use a <Location "/"> section, e.g.

<Location "/"> 
SetEnvIfNoCase User-Agent "lwp-trivial" bad_bot 
SetEnvIfNoCase User-Agent "libwww" bad_bot 
SetEnvIfNoCase User-Agent "Wget" bad_bot 
Deny from env=bad_bot 
</Location>

However, I found that although this did block the bots, there was an issue because it then allows hidden files, such as .htaccess and .htpasswd to be served up, even though there is code in httpd.conf to disallow it. I played around with the order of the <Files ... block (which does the stuff blocking file access) and the <Location ... block, but no matter which one had precedence it still allows hidden files to be served. If I take out the <Location ... block then the server prevents the hidden files being served, as it should do.

I've also tried doing rewrites in httpd.conf but that doesn't seem to work either (the block is at the foot of the files, but I've tried it above the virtual hosts section too), e.g.

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AlphaBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteRule ^(.*)$ - [L,R=403] 
</IfModule>

I get no errors using either method, but they're not doing what I want. This second method simply doesn't appear to block the bots.

I've also tried stuff like the following, also without success:

<Location "/var/www/sites/">
SetEnvIf User-Agent BLEXBot GoAway
Order allow,deny
Allow from all
Deny from env=GoAway
</Location>

... and

RewriteCond %{HTTP_USER_AGENT} "blexbot" [nocase]
RewriteRule ^.*$ – [forbidden,last]

... and seemingly every other combination of things which is possible. But I can still only block bots with individual .htaccess files, or with the <Location "/"> section (which allows the revealing of hidden files).

As can be seen, one of the user-agent strings I'm testing with is "Blexbot" and variations of it, and so the last thing I've tried is with modsecurity.

However, I don't seem able to get that working properly either: here are a couple of examples which I've tried:

SecRule REQUEST_HEADERS:User-Agent "BLEXBot" "deny,status:403,id:5000218,msg:'Badbot test for Blexbot'"
SecRule REQUEST_HEADERS:User-Agent "@pmFromFile badbots.txt" "id:350001,rev:1,severity:2,log,msg:'BAD BOT - Detected and Blocked. '"

If I look in /var/log/modsec_audit.log then I can see that modsecurity does identify the user-agent, and provides a log entry to that effect, but it doesn't actually prevent the pages from being served (which is kinda the whole point).

I do note that the modsec_audit.log has entries of Engine-Mode: "DETECTION_ONLY", which might explain the pages still being served, but I'm not familiar with much of modsecurity at all, so I'm not really sure about what it's doing.

If anyone can assist it would be truly appreciated! I just need a single method to work, but I kind of like the idea of using modsecurity if I can do, as it seems I can just put any bad-bot entries in a single separate file.

Cheddar
  • 131
  • 2
  • You can create and maintain a single configuration file with your custom directives and load that in each VirtualHost with an [`Include`](https://httpd.apache.org/docs/2.4/mod/core.html#include) or `IncludeOptional` directive and by definition any directive that is allowed in a `.htacess` file should also work from a ` .. ` block – HBruijn Nov 27 '17 at 10:20
  • The last condition does not need "OR"! – Daniel Ferradal Nov 27 '17 at 12:51
  • @HBruijn. I've already gotten it working by using a `` block and loading the `SetEnvIf` directives from an `Include` file in `httpd.conf`. Thanks for the comment though. @ezra-s. Yeah true; that was just a cut-away from a long list; the final entry in the actual list had no final `[OR]`. – Cheddar Nov 28 '17 at 12:36
  • Wrapping rewrite rules in `` is usually a bad idea. If you have forgotten to enable `mod_rewrite` these directives mean that your rewrite rules don't work and you get no error message about why they don't work. Without them, Apache fails to start and tells you why. – Ladadadada Nov 30 '17 at 11:37

1 Answers1

1

To forbid a page, a rewrite rule should contain [F] rather than [R=403].

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AlphaBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]
RewriteRule ^ - [L,F]

You're correct in your guess about mod_security. DETECTION_ONLY means that it won't actually forbid anything, just detect and log what it would do. You'll want to look through your configuration for SecRuleEngine DetectionOnly and comment it out.


The problem with the config that starts with <Location "/var/www/sites/"> is that /var/www/sites is a directory on the filesystem rather than being a path in the URL. <Location> refers to URLs and <Directory> refers to filesystem paths.

You can either use:

<Directory "/var/www/sites/">

or

<Location "/">

I can't see any way that first snippet could allow .ht* files. The only thing it does is deny some bots. I think you are mistaken about what caused those files to be accessible. You can move all the config from your .ht* files into the Apache config to avoid that problem if you can't figure out the access issues.

The purpose of .htaccess files is to allow users who don't have permission to change the global Apache config a limited measure of control over their own directories. If you have permission to edit the global Apache config there's no need for .htaccess files.

Ladadadada
  • 26,337
  • 7
  • 59
  • 90
  • Yes, I get what you're saying, but I assure you that with `` it did allow the `.ht*` files to be viewed, despite the `` directive to something else (literally, anything else). However, as soon as I changed `"/"` to something else, then the bots were not disallowed, even though the `.ht*` files are protected again. I ended up using a`` instead. Thx – Cheddar Dec 01 '17 at 13:34