1

I've managed to create an .htaccess file that does what I wanted (see explanations and questions after the code block):

<IfModule mod_rewrite.c>

RewriteEngine On

#1 If the requested file is not url-mapper.php (to avoid .htaccess loop)
RewriteCond %{REQUEST_FILENAME} (?<!url-mapper\.php)$

#2 If the requested URI does not end with an extension OR if the URI ends with .php*
RewriteCond %{REQUEST_URI} !\.(.*) [OR]
RewriteCond %{REQUEST_URI} \.php.*$ [NC]

#3 If the requested URI is not in an excluded location
RewriteCond %{REQUEST_URI} !^/seo-urls\/(excluded1|excluded2)(/.*)?$

#Then serve the URI via the mapper
RewriteRule .* /seo-urls/url-mapper.php?uri=%{REQUEST_URI} [L,QSA]

</IfModule>

This is what the .htaccess should do:

  1. Rule#1 is checking that the file requested is not url-mapper.php (to avoid infinite redirect loops). This file will always be at the root of the domain.

  2. Rule#2 the .htaccess must only catch URLs that don't end with an extension (www.example.com --> catch | www.example.com/catch-me --> catch | www.example.com/dont-catch.me --> don't catch) and URLs ending with .php\* files (.php, .php4, .php5, .php123, ...).

  3. Rule#3 some directories (and childs) can be excluded from the .htaccess (in this case /seo-urls/excluded1 and /seo-urls/excluded2).

  4. Finally the .htaccess feeds the mapper with a hidden GET parameter named uri containing the requested URI.

Even if I tested and everything works, I want to know if what I do is correct (and if it's the "best" way to do it). I've learned a lot with this "project" but I still consider myself a beginner with .htaccess and regular expressions so I want to triple check it there before putting it in production...

MrWhite
  • 12,647
  • 4
  • 29
  • 41
AlexV
  • 179
  • 1
  • 2
  • 16

2 Answers2

0
<IfModule mod_rewrite.c>

No need for the <IfModule> wrapper. See this question on the Webmasters stack: https://webmasters.stackexchange.com/questions/112600/is-checking-for-mod-write-really-necessary

#1 If the requested file is not url-mapper.php (to avoid .htaccess loop)
RewriteCond %{REQUEST_FILENAME} (?<!url-mapper\.php)$

There's no need for the negative lookbehind ((?<!). It would be preferable to simply have a negated regex. ie. with a ! prefix. For example: !^/url-mapper\.php$ (ie. not "/url-mapper.php") And check against REQUEST_URI (full URL-path) instead of REQUEST_FILENAME (absolute filesystem path).

However, you state that "this file will always be at the root of the domain", but you are rewriting to /seo-urls/url-mapper.php - which is not at the root of the domain.

So, this should strictly be written like this instead:

RewriteCond %{REQUEST_URI} !^/seo-urls/url-mapper\.php$

Unless you are confident there is only one url-mapper.php file on your system then the regex could be reduced to !/url-mapper\.php$.

Alternatively, check this in the RewriteRule pattern instead (since you aren't doing anything else with the RewriteRule pattern currently). This would be marginally more efficient.

#2 If the requested URI does not end with an extension OR if the URI ends with .php*
RewriteCond %{REQUEST_URI} !\.(.*) [OR]
RewriteCond %{REQUEST_URI} \.php.*$ [NC]

The first condition is too general - it catches any URL that simply contains a dot anywhere in the URL-path, not simply the file extension. There's also no need for the capturing subpattern. Assuming a file extension is between 2 and 4 characters then a pattern like !\.\w{2,4}$ would more accurately catch file extensions only.

Regex should generally be as restrictive as possible. The second condition that matches .php extension(s) could be made more restrictive, it currently matches .php anywhere in the URL-path. For example: \.php\d{0,4}$ - although realistically, you probably only have at most 1 digit after .php, so \.php\d?$ would probably be better.

I'd also question whether the NC (case-insensitive) flag is really required here. Do you really need to match .PHP or .PhP?

#3 If the requested URI is not in an excluded location
RewriteCond %{REQUEST_URI} !^/seo-urls\/(excluded1|excluded2)(/.*)?$

Ok, but the trailing (/.*)?$ is superfluous. You are matching anything that simply starts /seo-urls/excluded1 etc. Also, no need to backslash-escape slashes (curious that you've escaped the middle slash, but not the other two).

#Then serve the URI via the mapper
RewriteRule .* /seo-urls/url-mapper.php?uri=%{REQUEST_URI} [L,QSA]

Ok, although the regex .* (that matches everything) could be simplified to ^ - which is successful for everything, but doesn't actually match anything - which is all that you require.

However, I mentioned at the top about performing the url-mapper.php check in the RewriteRule pattern instead. So, this would be written:

RewriteRule !^seo-urls/url-mapper\.php$ /seo-urls/url-mapper.php?uri=%{REQUEST_URI} [L,QSA]

The RewriteRule pattern is processed first, before any of the conditions, so it's always preferable to do what you can in the RewriteRule directive, rather than separating it into another condition.

Summary

Bringing the above points together, we have:

RewriteEgnine On

#2 If the requested URI does not end with an extension OR if the URI ends with .php*
RewriteCond %{REQUEST_URI} !\.\w{2,4}$ [OR]
RewriteCond %{REQUEST_URI} \.php\d?$

#3 If the requested URI is not in an excluded location
RewriteCond %{REQUEST_URI} !^/seo-urls\/(excluded1|excluded2)

#Then serve the URI via the mapper
RewriteRule !^seo-urls/url-mapper\.php$ /seo-urls/url-mapper.php?uri=%{REQUEST_URI} [QSA,L]
MrWhite
  • 12,647
  • 4
  • 29
  • 41
0

2a will allow /some.dir/file because of the "." in the middle. You might want something like !\..[1,3], it would depend on your file/directory structure.

In the RewriteRules it's normal to use substitution matching instead of variable names:
RewriteRule (.*) /seo-urls/url-mapper.php?uri=$5 [L,QSA]
Note it's $5 because you currently have a match patterns all over your conditions.

Chris S
  • 77,945
  • 11
  • 124
  • 216
  • I'm not sure I understand that $5... Can you please explain it to me a bit more? Thanks! – AlexV Jan 15 '11 at 14:17
  • In a Cond/Rule block you can create match blocks with parenthesis "()", then in the finale rule substitution you can pull those matches back in by using a "$" followed by the match number. Usually a Cond doesn't have any, then in the final Rule you use them to reorder bits and pieces of the URL, though it's very useful to pull in other bits and pieces. Your Conds have them all over the place, but they aren't actually used anywhere. – Chris S Jan 16 '11 at 01:24
  • Unfortunately, everything about this answer is wrong! "2a will allow /some.dir/file" - No it doesn't - it _blocks_ everything containing a dot. "something like `!\..[1,3]`" - should be `!\..{1,3}`. "it's normal to use substitution matching instead of variable names" - it's just as "normal" to use variables, in fact, it is arguably preferable to do so. "Note it's $5 because you currently have a match patterns all over your conditions." - No, it's `$1`. `$n` refers only to the `RewriteRule` _pattern_. `%n` backreferencs refer to conditions - but then only the last matched condition. – MrWhite Aug 08 '20 at 17:23