2

Many websites provide the .htaccess code to block spambots, spam referral, etc.

Some websites use '\' before the '.' in domain names. For example:

RewriteCond %{HTTP_REFERER} spamdomain\.com [NC,OR]

SetEnvIfNoCase Referer spamdomain\.com spambot=yes

But some websites use direct domain name. For example:

RewriteCond %{HTTP_REFERER} spamdomain.com [NC,OR]

SetEnvIfNoCase Referer spamdomain.com spambot=yes

Now I'm confused which code syntax is correct? Do both codes work fine? Which one should I use?

peterh
  • 4,953
  • 13
  • 30
  • 44

1 Answers1

9

Because the second argument to both the RewriteCond and SetEnvIfNoCase directives are regular expressions (regex), not ordinary strings. Regex use a special syntax to define search patterns. In regex syntax, the dot/period is a special (meta) character that represents any character (except newlines by default). In order to match a literal dot (ie. disable its special meaning) you need to backslash escape the dot (ie. put a backslash before the dot).

So, a regex like spamdomain\.com matches the literal string spamdomain.com anywhere in the string being tested. (In the first example, it matches spamdomain.com anywhere in the HTTP_REFERER server variable.) Whereas a regex like spamdomain.com (where the dot is not escaped) would match spamdomainAcom, spamdomainBcom, etc. Because a dot matches any character.

RewriteCond %{HTTP_REFERER} spamdomain.com [NC,OR]

Strictly this is incorrect. The regex matches more than is intended. However, in reality this might not be a problem if a literal dot is the only character that could appear in that position. (But in this case, the dot should be escaped.)

Reference:

MrWhite
  • 12,647
  • 4
  • 29
  • 41
  • Thanks for your reply. So using '\' before the dot is correct. –  Aug 28 '17 at 10:14
  • 1
    Yes, in the examples you posted. It depends on whether the argument is a regex or an ordinary string. `.htaccess` code gets copy/pasted a lot so small errors can perpetuate. For example, you might sometimes see the dots escaped in the second argument of the `RewriteRule` directive (the _substitution_) eg. `RewriteRule ^foo$ http://example\.com/bar [R,L]` - in this case the 2nd argument is an ordinary string, not a regex, so the dots do not need to be escaped (although it doesn't break, because backslash escapes are still supported here). – MrWhite Aug 28 '17 at 10:23
  • 1
    You should read up about _regular expressions_ (a couple of links provided above). They are incredibly powerful for _pattern matching_. However, the syntax can be a bit cryptic in complex expressions. – MrWhite Aug 28 '17 at 10:26
  • Thanks a lot for your help. Can you please also tell me which one of the following code is correct: `RewriteCond %{HTTP_REFERER} domain\.com [NC,OR]` `RewriteCond %{HTTP_REFERER} ^http(s)?://(www.)?.*domain\.com.*$ [NC,OR]` `RewriteCond %{HTTP_REFERER} ^http://.*domain\.com/ [NC,OR]` I found these 3 different formats on different websites. So which one should I use to completely block the referrer domain? –  Aug 28 '17 at 10:29
  • 1
    They would probably all "work". However, the last one would not match a request coming from `https` - however, if `domain.com` doesn't use `https` then it makes no difference. And the second one is overly complex. Of the three, the first one is perhaps preferable IMO (it's easier to read), but none are ideal - they are all very generic and could match too much. For example, all 3 would match `http://www.not-this-domain.com/`. That maybe OK for you, maybe not? It's your call. A pattern like `\.domain\.com` might be preferable. Generally, with regex, you should be as _specific_ as possible. – MrWhite Aug 28 '17 at 11:54
  • 1
    Note that a pattern like `domain\.com` is matched anywhere in the `HTTP_REFERER`. For example, it would match `http://example.com/path/to/domain.com`, which may not be desirable, but this is also very unlikely. (But all three of those examples would match this URL.) – MrWhite Aug 28 '17 at 11:56
  • Thanks. So I'll use the first one domain\.com to block the spam referrer domain. –  Aug 28 '17 at 12:11