0

I have a web application that recently had its spec changed to allow for slashes in names of some of its documents. Resultantly, I have had to change my .htaccess file to also match slashes. However, the issue is that I only want to match slashes that are encoded i.e. catch %2F but not /.

Consider the following URL:

http://www.example.com/document/edit/STAT%2F12/

My .htaccess looks like:

RewriteRule ^document\/([a-z0-9-]+)?\/?([a-z0-9-\W\s]+)?\/?$ documents.php?request=$1&id=$2& [NC,QSA,L]

The above request catches the $id as 'STAT/12/' instead of 'STAT/12'. In other words, it matches the trailing slash even though it isn't encoded.

Please note, I have switched on AllowEncodedSlashes On.

Ben Carey
  • 16,540
  • 19
  • 87
  • 169
  • 1
    Are you sure this is even possible? I remember reading once that proxies etc. may decode urlencoded slashes in a request, so there are zero guarantees that slashes encoded in the initial client reach your application as encoded slashes. – ThiefMaster Nov 06 '15 at 00:53
  • @ThiefMaster That would be a very poor proxy, wouldn't it? How would the receiving server be able to tell stuff in the query string from path info? That's the whole point of the encoding. – miken32 Nov 06 '15 at 00:54
  • 1
    http://stackoverflow.com/questions/1957115/is-a-slash-equivalent-to-an-encoded-slash-2f-in-the-path-portion-of-a - apparently encoding only changes the meaning of otherwise special chars (like `?` starting the query string). But `/` is not a special char.. – ThiefMaster Nov 06 '15 at 00:55
  • @ThiefMaster Basically, what you are both saying is in order to have slashes in the sodding document names, I have to stop using 'pretty' URL's? – Ben Carey Nov 06 '15 at 00:55
  • No, I think Apache sees the encoded slash. It's not until it passes it down to PHP in your case that the encoding is lost. – miken32 Nov 06 '15 at 00:58
  • Usually you put something with slashes either at the end of the pretty url or have a known number of extra path segments after the variable part. In both of those cases you can easily figure out what's the path and what's another url segment. Usually you'd forward the full request uri to your application instead of adding rewrite rules for all your pretty URLs and do parsing and routing in your code. – ThiefMaster Nov 06 '15 at 01:08

1 Answers1

1

That's because the section of your regexp [a-z0-9-\W\s] is catching the slash. If Apache supports it, use a non-greedy capture, or use a different character class.

RewriteRule ^document\/([a-z0-9-]+)?\/?([a-z0-9-\W\s]+?)?\/?$ documents.php?request=$1&id=$2& [NC,QSA,L]

Non-greedy or lazy capture is the ? after the + and will capture as few characters as possible, so it stops before the trailing /.

https://regex101.com/r/uK8zM3/1

The URL encoded stuff will arrive at your server encoded, so if all you need is to capture %2F where you weren't before, just allow % in addition to whatever worked previously. Your character class above allows whitespace for example, I don't think you want to be doing that in a URL!

miken32
  • 42,008
  • 16
  • 111
  • 154
  • Sadly this is not the issue. Apache is seeing %2F and / as the same character as %2F is just and encoded slash... – Ben Carey Nov 06 '15 at 00:57
  • That's surprising. Have you tried `AllowEncodedSlashes NoDecode`? – miken32 Nov 06 '15 at 00:59
  • Even if that is the case, the non-greedy capture would not match the `/$` at the end of the URL. https://regex101.com/r/uK8zM3/1 – miken32 Nov 06 '15 at 01:04
  • I didnt even realise that existed, I will try that out :-). Does the non greedy capture mean it wouldnt match more than one slash though? – Ben Carey Nov 06 '15 at 01:05
  • No, it will match multiple ones, but stops before the last `/$` combination because it's outside of the capture. – miken32 Nov 06 '15 at 01:06
  • That sounds very promising. I wont be to test this until tomorrow as I am currently out the office but it sounds like you may have saved me! If so I will be sure to come back and accept the answer. Thank you very much! – Ben Carey Nov 06 '15 at 01:10
  • Unfortunately this didn't fix the issue as it doesn't catch the last slash if I have the following url: 'document/edit/test%2F/'. This only catches 'test', not 'test%2F'... – Ben Carey Nov 06 '15 at 10:16