-1

I'm using AWS–s3 for static website hosting, and running it through Cloudflare services (including DNS). It is SEO best practice to truncate the .html from URL names, while simultaneously avoiding duplicate content. I was achieving the desired result using nginx, and am wondering if it is even possible using either s3 or Cloudflare. My gut tells me no.

The basic requirement is: example.com/about.html should rewrite (not redirect) to example.com/about. The file name stored on s3 should remain, obviously, *.html.

The one hack I've stumbled across is:

  1. Change the file name on the server to about (without the file extension).
  2. Then, in the s3 console, change the metadata content-type back to text/html.

I view this as a horrible "solution": Visiting *.html results in a 404. Unless, of course, you create a duplicate file with the .html extension, and then possibly create a url forwarding rule in Cloudflare. Not only is it very messy, it just plain doesn't scale.

Is there a better way?

JamesJosephFinn
  • 324
  • 3
  • 12

1 Answers1

4

My gut tells me no.

Your gut is both correct and incorrect.

You can't quite have it both ways with S3; implied extensions aren't supported... however, there is a way to do it while remaining (arguably) SEO-sane.

Instead of aboutabout.html you can make aboutabout/about/index.html.

Enable index documents on the bucket. If the browser requests /about and that's not an object, it will see a response of 301 Moved Permanently with Location: /about/.

When S3 sees a request for /about/, it will return the contents of /about/index.html without issuing a redirect.


Of course, your original workaround of changing the Content-Type in the console can be avoided if you set the content type manually when the document is uploaded in the console. There are many content types the console does not automatically set when uploading, so I am in the habit of setting them manually, anyway.

Michael - sqlbot
  • 169,571
  • 25
  • 353
  • 427
  • Thanks, man. I had already considered this solution, but shied away because of the obvious hackery involved. Too bad there is no "real" way to do it. So basically, each page gets it's own folder? – JamesJosephFinn Oct 26 '15 at 01:56
  • 1
    Essentially, yes, though folders are something of an illusion in S3. Files are actually stored with literal slashes in their names, and the console generates the folder presentation. If you create files with the API you can just put the slashes in their names and don't actually have to create the folders, though they'll still appear in the console. – Michael - sqlbot Oct 26 '15 at 03:24
  • 1
    Of course, you could proxy through nginx to S3 :) though that might defeat whatever reason you might have for eliminating it in the first place. I often run HAProxy in front of buckets, for different reasons, but it would be possible in that environment to append `.html` to the request path if the pattern matched, say, `/[^\.]+$` (path with no `.` between the last slash and the end of the string). – Michael - sqlbot Oct 26 '15 at 03:31
  • That's where logic leads me, also. I lost my dev-ops guy, and I don't know how to configure nginx. Cloudflare is my proxy, and they use nginx. But, I'm currently searching to see if there is a way to do this in their dashboard, but I don't think it's a feature they support (yet). – JamesJosephFinn Oct 26 '15 at 16:42