0

URLs with special/reserved characters are obviously not to prefer, however you can use them and they work, as defined by (I believe) RFC 3986.

For instance: https://www.example.com/en/product/apple,orange.html would be the same as https://www.example.com/en/product/apple%2Corange.html. If you rewrite these URLs with nginx rewrite to a product.php script, they would rewrite to product.php?param=apple,orange and product.php?apple%2Corange. product.php will never know whether the user came thru the url encoded url or not, it will find param=apple,orange in both cases.

This should not be a problem, since they are really the same URL, just with or without URL encoded comma. Both URLs work perfectly.

Now to my point. It seems Google index both URL separately. If this is the case, it would provoke duplicate content. Two identical pages with different URLs. I thought they were to be considered as the same page.

After implementing hreflang on a site for international targeting, Google Search Console reports errors indicating that: Originating URL https://www.example.com/en/product/apple%2Corange.html has an alternate URL https://www.example.com/fr/produit/pomme,orange.html and this alternate URL does not link back to the originating URL. This is true since the alternate URL link back to https://www.example.com/en/product/apple,orange.html. http://hreflang.ninja report the exact same error. The website does not in any case link to the encoded URL https://www.example.com/en/product/apple%2Corange.html but somehow it has been indexed, I suppose by an external link.

How can I make https://www.example.com/en/product/apple%2Corange.html 301 redirect to https://www.example.com/en/product/apple,orange.html considering PHP does not know whether the URL contains encoded characters?

And yeah, I know apple-orange.html would be the preferred URL, but I'm interested in learning how to treat encoded characters in URL.

jonr
  • 136
  • 2
  • 10
  • Possible duplicate of [mod\_rewrite urlencoding an already urlencoded query string parameter - any way to disable this?](https://stackoverflow.com/questions/6520484/mod-rewrite-urlencoding-an-already-urlencoded-query-string-parameter-any-way-t) –  Jan 17 '18 at 20:40
  • p.s its not php but the webserver doing the automatic decoding. –  Jan 17 '18 at 20:42
  • @rtfm Removed reference to mod_rewrite, I'm interested in an nginx solution. The question you reference gives me a hint about a "no escape" parameter but only treat apache's mod_rewrite. Let's see if there's an nginx equivalent, thanks. – jonr Jan 17 '18 at 21:17
  • you may be better asking on https://serverfault.com/ –  Jan 17 '18 at 21:22
  • @rtfm Also, it seems the problem in that question was that the URL was being encoded upon rewrite. This is not the case here, the case is that it is being decoded upon rewrite. PHP reads a comma in both cases. – jonr Jan 17 '18 at 21:25

0 Answers0