3

I'll have to store millions of files (many TB in the future) in S3. Are there any limitations? (not a price :) ), i'm asking about architectural limitations (like - don't store it this way, the other way will be better/faster). My files are in a hierarchy

/{country}/{number}/{code}/docs

and i checked i can keep them that way (to access them easy thru REST) (of course i know S3 keeps them internally in other way - not important to me). So, are there any limitations/pitfalls ?

razor
  • 2,727
  • 6
  • 33
  • 45

2 Answers2

2

S3 has no limits that you would hit. The files are not really in folders, they are just strings as locations. Make the folder structure something that is easy for you to keep track of and organize.

You do NOT want to be listing the "folder" contents in S3 to find things. S3 is slow at giving directory listings, because it's not really directories.

You should be storing either the whole path /{country}/{number}/{code}/docs in a database or the logic should be so repeatable that you can be confident that the file will be in that location.

James Brady gave an excellent and very detailed answer to how s3 treats file storage in a question here https://stackoverflow.com/a/394505/4179009

Community
  • 1
  • 1
greg_diesel
  • 2,955
  • 1
  • 15
  • 24
  • how slow is the listing? i don't need to list it. maybe only for test or check purposes – razor May 18 '15 at 19:43
  • With 100's of objects it's not bad like a second or two. But in production systems you do not want to try to list folders with 10's of thousands of objects. It's just not very efficient at that. – greg_diesel May 18 '15 at 20:32
  • Sorry I'm pretty late to the party. Is it generally true that if I have a lot of small files (e.g. game logs), I should just snuck them into one single bucket? And because S3 charges more with more actions, should I try to combine multiple logs into one big blob and store/fetch it? – Nicholas Humphrey Nov 09 '21 at 02:22
2

AWS S3 does definitely have limits to access 100req/sec in case of similar path prefix, see the official docs: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

From the other side a hierarchical approach makes logic complicated. A trade off depends on your requirements, one of good options can be using at least 4 symbols length key (primary id or hash key) in front of URL. In case of having limited number countries try using multiple buckets with country code as a bucket name, it also helps to define a specific physical location if required.

Anatoly
  • 15,298
  • 5
  • 53
  • 77