1

Maybe I'm doing something stupid, but I can't get rid of an issue with htaccess.

I'm trying to match a function name in a documentation site and I'm getting errors I can't understand. I must point that I (think I) know about regular expressions escaping, and I know what dot and backslash-dot mean.

So: i want to allow all of these:

example.com/foofunction
example.com/foofunction.php
example.com/function.foofunction
example.com/function.foofunction.php

These are the lines that I've tried. Those which cause error are misunderstood, so lots of thanks to anyone that can explain any to me:

^function\.([A-Za-z0-9_-]+)(\.php)?$ -> works, but makes function. mandatory

^(function\.)?([A-Za-z0-9_-]+)(\.php)?$ -> internal error... ok, let's not escape dot, in the end, it will match any character and will work...

^(function.)?([A-Za-z0-9_-]+)(\.php)?$ -> internal error too! ok, just for trying, dot outside conditional?

^(function)?\.([A-Za-z0-9_-]+)(\.php)?$ -> works, ok, but it makes dot mandatory. By the way, more crazy things:

^(function)?.([A-Za-z0-9_-]+)(\.php)?$ -> if dot isn't escaped (imagine I want to allow any character), internal error too. Now i`ll try to make dot optional separately

^(function)?(\.)?([A-Za-z0-9_-]+)(\.php)?$-> internal error too, i'm going crazy...

These are my tries up to now, I'm going to try optional lookbehind and update with results... anyway, i'd love to understand whi those regexes cause internal error.

And if anyone knows about an "htaccess special regex exceptions" reference or something like that i must read, wil be very wellcome.

Thanks in advance to all of you guys.

Áxel Costas Pena
  • 5,886
  • 6
  • 28
  • 59

1 Answers1

3

Use non capturing groups for everything apart from the actual function name:

^(?:function\.)?([A-Za-z0-9_\-]+)(?:\.php)?$

Let's break that down:

^                   # assert start of string
  (?:function\.)?   # optionally allow the string "function."
  ([A-Za-z0-9_\-]+) # capture the function name - this could be shortened to ([-\w]+)
  (?:\.php)?        # optionally allow the string ".php"
$                   # assert end of string

So your .htaccess would look (I guess) something like this:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(?:function\.)?([A-Za-z0-9_\-]+)(?:\.php)?$ doc.php?functionname=$1 [L,QSA]

IMPORTANT POINT and the actual solution in this case:

You must use a sensible combination of RewriteCond and (usually) the [L] flag to ensure that the rule matches only once.

mod_rewrite behaves in a slightly counter-intuitive way that is not always immediately apparent: it keeps running the rules over and over until there are no more matches. So, let's say I use the rule outlined above:

RewriteRule ^(?:function\.)?([A-Za-z0-9_\-]+)(?:\.php)?$ doc.php?functionname=$1

...and I supply to this rule the input function.myfunc.php. First, it will be rewritten to:

doc.php?functionname=myfunc

However, next time it will match again. And it will be rewritten to:

doc.php?functionname=doc

...and this will keep happening over and over until MaxRedirects is reached and Apache will throw an error - which you will see on the client side as a 500 response.

The solution to this depends on your exact use case, but a common solution (the one I used above) is to check whether the requested file exists before applying the rewrite rule. By doing this, on the second iteration the rule will not be applied, and the request will be allowed to fall through for further processing.

The [L] flag is also commonly (over)used - this causes the current iteration of the rewrite process to stop, and start again at the next iteration. It effectively does the same thing as continue does to a loop in PHP.

Since Apache 2.3, a much more useful flag (to this situation) is available - [END]. This gives the behaviour most people expect from [L], it causes the rewrite process to halt immediately with no further iterations, like the break construct in PHP. Using this would mean that the aforementioned RewriteConds are no longer necessary. However, because this is only available in 2.3+, it can't be safely used unless you know for certain it will be available in every environment you run on.

DaveRandom
  • 87,921
  • 11
  • 154
  • 174
  • Thanks for your answer DaveRandom. I already use non-capturing groups on my regexes, and here they are mandatory to match the function name on $1 with the optional portion "function.". I simply removed them from my post to make the code more readable. What I want to say is that using non-capturing groups or not using them on this case doesn't change anything - internal error still happen. – Áxel Costas Pena Dec 17 '12 at 12:58
  • 1
    @Áxel Are you using a suitable combination of `RewriteCond`s and `[L]` flag to ensure you don't get stuck in a loop? Can you show (the relevant part of) your full .htaccess file instead of just the regex? – DaveRandom Dec 17 '12 at 13:00
  • Oh my god, Dave, that was my mistake, I was thinking about the issue the wrong way assuming it was a regexp error, and finally it was student-like error, i'm so ashamed. Please, post that comment in a separate answer so I can mark it as valid., I have two questions about your past code but I'll mp you to not disturb here. – Áxel Costas Pena Dec 17 '12 at 13:06
  • @Áxel I have edited this answer to reflect the correct solution to your problem. – DaveRandom Dec 17 '12 at 13:19
  • 1
    @Áxel More info about rewrite flags added, worth a read (I think) – DaveRandom Dec 17 '12 at 13:24
  • Oh, I have never dealed with the chat system, I can't achieve to invite you, so iI'll ask you my doubts here, against the rules, if you don't mind me wasting your time... :S ... ok ... I have just read your new edit ... I will read your new post and later I will reconsider my doubts... – Áxel Costas Pena Dec 17 '12 at 13:36
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/21272/discussion-between-daverandom-and-axel) – DaveRandom Dec 17 '12 at 13:40