I use rewrite on my nginx server to allow URLs like https://www.example.com/en/product/apple.html
to pass en
, product
and apple.html
to a single PHP script like so:
rewrite ^/([a-zA-Z0-9_\-]+)/([a-zA-Z0-9_\-]+)/(.+)$ /index.php?lang=$1&page=$2&part=$3&$query_string last;
As you can see, the third part, in this case apple.html
, would match any characters. When this part contains URL encoded special characters nginx seem to be decoding them on the fly, PHP would not be able to detect whether the user entered with the encoded character in the URL or decoded. For example: /en/product/apples,oranges.html
and /en/product/apples%2Coranges.html
, PHP would read apples,oranges.html
in both cases.
For the sake of not having 2 URLs with the same content: Can nginx rewrite the URL without decoding URL encoded special/reserved characters so PHP can determine whether it should redirect to the non-encoded URL? Or, perhaps even better, can it be configured to 301 redirect /en/product/apples%2Coranges.html
to /en/product/apples,oranges.html
?
PS. I know the better URL would be /en/product/apples-oranges.html
and forget about the comma. But since the web allows us to form URLs with special characters such as comma, I'm interested in learning how to deal with them.