On 17 July 2018 there was an official AWS announcement explaining that there is no longer any need to randomize the first characters of every S3 object key to achieve maximum performance: https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3-announces-increased-request-rate-performance/
Amazon S3 Announces Increased Request Rate Performance
Posted On: Jul 17, 2018
Amazon S3 now provides increased performance to support at least 3,500 requests per second to add data and 5,500 requests per second to retrieve data, which can save significant processing time for no additional charge. Each S3 prefix can support these request rates, making it simple to increase performance significantly.
Applications running on Amazon S3 today will enjoy this performance improvement with no changes, and customers building new applications on S3 do not have to make any application customizations to achieve this performance. Amazon S3’s support for parallel requests means you can scale your S3 performance by the factor of your compute cluster, without making any customizations to your application. Performance scales per prefix, so you can use as many prefixes as you need in parallel to achieve the required throughput. There are no limits to the number of prefixes.
This S3 request rate performance increase removes any previous guidance to randomize object prefixes to achieve faster performance. That means you can now use logical or sequential naming patterns in S3 object naming without any performance implications. This improvement is now available in all AWS Regions. For more information, visit the Amazon S3 Developer Guide.
That's great, but it's also confusing. It says Each S3 prefix can support these request rates, making it simple to increase performance significantly
But since prefixes and delimiters are just arguments to the GET Bucket (List Objects)
API when listing the content of buckets, how can it make sense to talk about object retrieval performance "per prefix". Every call to GET Bucket (List Objects)
can choose whatever prefix and delimiter it wants, so prefixes are not a pre defined entity.
For example, if my bucket has these objects:
a1/b-2
a1/c-3
Then I may choose to use "/" or "-" as my delimiter whenever I list the bucket contents, so I might consider my prefixes to be either
a1/
or
a1/b-
a1/c-
But since the GET Object
API uses the whole key, the concept of a particular prefix or delimiter does not exist for object retrieval. So can I expect 5,500 req/sec on a1/
or alternatively 5,500 req/sec on a1/b-
and 5,500 on a1/c-
?
So can someone explain what is meant by the announcement when it suggests a particular level of performance (e.g. +5,500 requests per second to retrieve data) for "each s3 prefix"?