One of the main purposes of URL normalization is to avoid GET
requests on distinct URLs that produce the exact same result.
Now, I know that you can check for the canonical tag
and even compare two URL's HTML to see if they're the same, however you have to download the exact same resource twice in order to do this, beating the point I stated before.
Is there a way to check for duplicated content doing only a HEAD request? If not, is there a way to only download the <head>
section of a web page without downloading the entire document?
I can think of solutions for the last one, I just wan't to know if there's a direct one.