2

Working on a site where the plan is to move URLs from a query string format to a number based format. Lots of URLs exist that have unescaped accented & similar UTF8 characters in them. The problem? I can’t seem to get Apache2 to properly match accented characters & do a rewrite. I am doing this all in the Apache2 config.

For example, this URL:

http://great.website.example.com/?place=cafe

Will work as expected with this Apache2 RewriteRule setting:

  RewriteCond %{QUERY_STRING} ^(place|location)=cafe
  RewriteRule ^/find/$ /find/1234? [L,R=301]

Now look at this URL. Note the accented é:

http://great.website.example.com/?place=café

Why doesn’t that URL work with the following Apache2 RewriteRule setting:

  RewriteCond %{QUERY_STRING} ^(place|location)=café
  RewriteRule ^/find/$ /find/1234? [L,R=301]

Both of these rules should rewrite the URL to the following:

http://great.website.example.com/find/1234

But the example with the accented é simply doesn’t work. Maybe a wildcard character would work, but I can’t seem to get that to work either.

Giacomo1968
  • 3,542
  • 27
  • 38

3 Answers3

1

Your /?place=café will be url-encoded by the browser to /?place=caf%C3%A9 and this is what you should match.

dsznajder
  • 547
  • 4
  • 13
  • Ahhh! UTF8! I have a script to generate the RewriteRule stuff that I then put into the Apache2 config & I forgot to set it to encode UTF8 prior to URL encoding it! Thank you! – Giacomo1968 Mar 27 '13 at 02:28
1

You can use a RewriteMap to do the unescaping for you. like this:

RewriteMap unescape int:unescape

RewriteCond %{QUERY_STRING}  (location|place)=(.*)
RewriteCond ${unescape:%2}   café
RewriteRule ^/find/$         /find/1234? [L,R]

In the second RewriteCond line I use %2, as %1 would contain either "location" or "place".

However, adding a lot of RewriteRules to your config in order to map words to numbers is going to be a big performance hit on your server, and will be hard to maintain. A better solution is to use a RewriteMap for that too.

For example, asume that /etc/apache2/places.txt contains:

café    1234
shop   1235
...

Then this whould work for you:

RewriteMap unescape int:unescape
RewriteMap places   txt:/etc/apache2/places.txt

RewriteCond %{QUERY_STRING}  (location|place)=(.*)
RewriteCond ${unescape:%2}   (.*)
RewriteRule ^/find/$         /find/${places:%1}? [L,R]

You can also use a RewriteMap based on a database query. That would be my preferred choice, as I could then ofload the job of matching words to numbers to the content management system.

More details you can find in the documentation: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewritemap

Giacomo1968
  • 3,542
  • 27
  • 38
Krist van Besien
  • 1,862
  • 13
  • 16
0

In a related question, someone suggested using RewriteMap to call an external program to rewrite URLs.

Also: Perhaps the request is actually something different entirely? A browser might have internally translated the accented characters to url-encoded ASCII? E.g. '%20' rather than ' '.

jeffrey_t_b
  • 101
  • 1