96

If I had a million images, would it be better to store them in some folder/sub-folder hierarchy or just dump them all straight into a bucket (without any folders)?

Would dumping all the images into a hierarchy-less bucket slow down LIST operations?

Is there a significant overhead in creating folders and sub folders on the fly and setting up their ACLs (programatically speaking)?

Powerlord
  • 87,612
  • 17
  • 125
  • 175
Nikhil Gupte
  • 3,266
  • 4
  • 25
  • 15

3 Answers3

136

S3 doesn't respect hierarchical namespaces. Each bucket simply contains a number of mappings from key to object (along with associated metadata, ACLs and so on).

Even though your object's key might contain a '/', S3 treats the path as a plain string and puts all objects in a flat namespace.

In my experience, LIST operations do take (linearly) longer as object count increases, but this is probably a symptom of the increased I/O required on the Amazon servers, and down the wire to your client.

However, lookup times do not seem to increase with object count - it's most probably some sort of O(1) hashtable implementation on their end - so having many objects in the same bucket should be just as performant as small buckets for normal usage (i.e. not LISTs).

As for the ACL, grants can be set on the bucket and on each individual object. As there is no hierarchy, they're your only two options. Obviously, setting as many bucket-wide grants will massively reduce your admin headaches if you have millions of files, but remember you can only grant permissions, not revoke them, so the bucket-wide grants should be the maximal subset of the ACL for all its contents.

I'd recommend splitting into separate buckets for:

  • totally different content - having separate buckets for images, sound and other data makes for a more sane architecture
  • significantly different ACLs - if you can have one bucket with each object receiving a specific ACL, or two buckets with different ACLs and no object-specific ACLs, take the two buckets.
James Brady
  • 27,032
  • 8
  • 51
  • 59
  • in S3 there are buckets and then inside you can have 'folders' and 'objects' where a folder is probably an object in the eyes of the system – mwm May 08 '14 at 16:06
  • 19
    @mwm you're mistaken. The "folders" are strictly UI niceties provided by whatever tool you're using. James is correct that keys might have slashes but that s3 doesn't care at all -- it definitely doesn't think of them as folders. – Ry4an Brase Jun 19 '14 at 02:37
  • S3 does definitely rate limit requests based on path prefix, look at official docs: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html – Anatoly May 17 '15 at 19:06
  • S3 Prefixes are no longer necessary or recommended: https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3-announces-increased-request-rate-performance/ – Aea Sep 27 '18 at 02:07
  • 4
    @Anatoly Prefixes impact performance becuase of the internal implementation of hashing function that is responsible for distributing object between physical storage locations. But it does not mean that there are folders in place. In fact you could acheve exact same performance effect if you used prefixes `ABCD_` instead of `ABCD/` :) – Tomasz Kapłoński Jun 07 '21 at 12:23
66

Answer to the original question "Max files per directory in S3" is: UNLIMITED. See also S3 limit to objects in a bucket.

Community
  • 1
  • 1
Vacilando
  • 2,819
  • 2
  • 30
  • 27
-3

I use a directory structure with a root then at least one sub directory. I often use "document import date" as the directory under the root. This can make managing backups a little easier. Whatever file system you are using you're bound to hit a file count limit (a practical if not a physycal limit) eventually. You might think about supporting multiple roots as well.

Jim Blizard
  • 4,255
  • 28
  • 37