URLs with special/reserved characters are obviously not to prefer, however you can use them and they work, as defined by (I believe) RFC 3986.
For instance:
https://www.example.com/en/product/apple,orange.html
would be the same as https://www.example.com/en/product/apple%2Corange.html
. If you rewrite these URLs with nginx rewrite to a product.php
script, they would rewrite to product.php?param=apple,orange
and product.php?apple%2Corange
. product.php
will never know whether the user came thru the url encoded url or not, it will find param=apple,orange
in both cases.
This should not be a problem, since they are really the same URL, just with or without URL encoded comma. Both URLs work perfectly.
Now to my point. It seems Google index both URL separately. If this is the case, it would provoke duplicate content. Two identical pages with different URLs. I thought they were to be considered as the same page.
After implementing hreflang on a site for international targeting, Google Search Console reports errors indicating that:
Originating URL https://www.example.com/en/product/apple%2Corange.html
has an alternate URL https://www.example.com/fr/produit/pomme,orange.html
and this alternate URL does not link back to the originating URL. This is true since the alternate URL link back to https://www.example.com/en/product/apple,orange.html
. http://hreflang.ninja report the exact same error. The website does not in any case link to the encoded URL https://www.example.com/en/product/apple%2Corange.html
but somehow it has been indexed, I suppose by an external link.
How can I make https://www.example.com/en/product/apple%2Corange.html
301 redirect to https://www.example.com/en/product/apple,orange.html
considering PHP does not know whether the URL contains encoded characters?
And yeah, I know apple-orange.html
would be the preferred URL, but I'm interested in learning how to treat encoded characters in URL.