My website software replaces space characters with '+' characters in the URL, A proper link would look like 'http://www.schirmacher.de/display/INFO/How+to+reattach+a+disk+to+XenServer' for example.
Some websites link to that article but somehow their embedded editor can't handle the encoding, so what I see in the httpd log files is actually
GET /display/INFO/How%2525252bto%2525252breattach%2525252ba%2525252bdisk%2525252bto%2525252bXenServer
which of course leads to a 404 error. It seems that the '+' character is encoded as '%2b' and then the '%' character is encoded as '%25' - several times.
Since there are many such references to different pages from different websites, I would like to rewrite the url so that the visitors get the correct page.
Here's my attempt which does not work:
RewriteRule ^(.*)%25(.*)$ $1%$2 [R=301]
What it is supposed to do is: take everything before the %25 string and everything after it, concat those strings with a '%' in between, then redirect.
With the example input URL the rule should rewrite to
/display/INFO/How%25252bto%2525252breattach%2525252ba%2525252bdisk%2525252bto%2525252bXenServer
followed by a redirect, then it should rewrite to
/display/INFO/How%252bto%2525252breattach%2525252ba%2525252bdisk%2525252bto%2525252bXenServer
and again to
/display/INFO/How%2bto%2525252breattach%2525252ba%2525252bdisk%2525252bto%2525252bXenServer
and so on. Finally, after a lot of redirects I should have left
/display/INFO/How%2bto%2breattach%2ba%2bdisk%2bto%2bXenServer
which is a valid url equivalent to /display/INFO/How+to+reattach+a+disk+to+XenServer.
My problem is that the expression does not match at all, so it does not even replace a single occurrence of %25.
I understand that there is a limit in the number of redirects and I should really use the [N] flag however I don't even get the first step right.
@Ben Lee: thanks for your detailed answer. I have now spent several hours on that problem. Here's what I have found out:
Any '%25' string in the url is converted to '%' before mod_rewrite sees it. So the RewriteRule ^(.)%25(.)$ does not match '%25' in the url, it actually matches '%2525'.
The presence of a backslash does not make a difference. It seems that the '%' sign is not interpreted as a backreference in my case, perhaps because there is no RewriteCond statement before. But it is probably better to use it, just to be sure.
The line having [L,R=301] is incorrect. It will attempt to redirect for every %2b match but there is a limit of allowed redirects and it will fail if there are more.
Here are the mod_rewrite lines I am using:
RewriteRule ^(.*)\%25(.*\%25.*)$ $1%$2 [N]
RewriteRule ^(.*)\%25(.*)$ $1%$2 [R=301,L]
RewriteRule ^(.*)\%2b(.*\%2b.*)$ $1+$2 [N]
RewriteRule ^(.*)\%2b(.*)$ $1+$2 [R=301,L]
The third line will replace all but one %2b sequences with a '+' character. When there is only one %2b sequence left, the fourth line will match, forcing a redirect.
The first and second line are basically the same but with a %25 sequence. It is necessary to have a rule with an [R] flag for each possible character sequence because I am also using mod_proxy / mod_jk and the redirect will make sure that the resulting url is fed to each module again. Otherwise httpd would attempt to fetch the url from disk which would fail in my case.