0

i have a Magento website working with Apache2 and every day i get bots crawling urls that i do not want to and i would like to return 429 for specific urls.

i have tried this in my htaccess

<ifModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (Yahoo!\Slurp|semrushbot) [NC]
Redirect 429 /shop/ricerca/
</ifModule>

where my bots are:

yahoo slurp googlebot yandex semrush etc

and path to protect are:

/path/toprotect/

/path/to/protect2/

tailing access logs seems my snippet is not working.

any one can help me?

eventually if too messy, i'll just use nginx as reverse proxy and filter bots from there where i have more experience

x86fantini
  • 13
  • 3

1 Answers1

3

There are several errors:

Any valid HTTP response status code may be specified, using the syntax [R=305], with a 302 status code being used by default if none is specified. The status code specified need not necessarily be a redirect (3xx) status code. However, if a status code is outside the redirect range (300-399) then the substitution string is dropped entirely, and rewriting is stopped as if the L were used.

  • The escape character \ in Yahoo!\ Slurp is for escaping the space you are missing.

Result after these corrections, redirecting everything to /shop/ricerca/:

<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} (Yahoo!\ Slurp|semrushbot) [NC]
    RewriteRule "(.*)" "/shop/ricerca/" [L,R=302]
</IfModule>

Or, if you are not trying to redirect to /shop/ricerca/ but protect it with 429 Too Many Requests:

<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} (Yahoo!\ Slurp|semrushbot) [NC]
    RewriteRule "^/shop/ricerca/(.+)" - [R=429]
</IfModule>
Esa Jokinen
  • 46,944
  • 3
  • 83
  • 129
  • 1
    "You can't use HTTP 429 Too Many Requests with a redirect." - Well, you can - but it's then obviously not a (3xx) _redirect_. But you can specify a non-3xx code with both the `Redirect` and `RewriteRule` directives. As you've quoted, "if a status code is outside the redirect range (300-399) then the substitution string is dropped entirely, and rewriting is stopped as if the L were used." - Apache then returns the stated HTTP status code together with the appropriate error document in the response. eg. `RewriteRule foo - [R=429]`. – MrWhite May 16 '17 at 12:32
  • 2
    Also, `/shop/ricerca/` would seem to be the URL to protect, not the intended target URL. You don't want to issue a 3xx redirect when trying to _block_ bots. – MrWhite May 16 '17 at 12:33
  • Ok. The question was a bit ambiguous on that. I added another solution based on that interpretation, and explained what both are doing. – Esa Jokinen May 16 '17 at 15:42