13

On 17 July 2018 there was an official AWS announcement explaining that there is no longer any need to randomize the first characters of every S3 object key to achieve maximum performance: https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3-announces-increased-request-rate-performance/

Amazon S3 Announces Increased Request Rate Performance

Posted On: Jul 17, 2018

Amazon S3 now provides increased performance to support at least 3,500 requests per second to add data and 5,500 requests per second to retrieve data, which can save significant processing time for no additional charge. Each S3 prefix can support these request rates, making it simple to increase performance significantly.

Applications running on Amazon S3 today will enjoy this performance improvement with no changes, and customers building new applications on S3 do not have to make any application customizations to achieve this performance. Amazon S3’s support for parallel requests means you can scale your S3 performance by the factor of your compute cluster, without making any customizations to your application. Performance scales per prefix, so you can use as many prefixes as you need in parallel to achieve the required throughput. There are no limits to the number of prefixes.

This S3 request rate performance increase removes any previous guidance to randomize object prefixes to achieve faster performance. That means you can now use logical or sequential naming patterns in S3 object naming without any performance implications. This improvement is now available in all AWS Regions. For more information, visit the Amazon S3 Developer Guide.

That's great, but it's also confusing. It says Each S3 prefix can support these request rates, making it simple to increase performance significantly

But since prefixes and delimiters are just arguments to the GET Bucket (List Objects) API when listing the content of buckets, how can it make sense to talk about object retrieval performance "per prefix". Every call to GET Bucket (List Objects) can choose whatever prefix and delimiter it wants, so prefixes are not a pre defined entity.

For example, if my bucket has these objects:

a1/b-2
a1/c-3

Then I may choose to use "/" or "-" as my delimiter whenever I list the bucket contents, so I might consider my prefixes to be either

a1/ 

or

a1/b-
a1/c-

But since the GET Object API uses the whole key, the concept of a particular prefix or delimiter does not exist for object retrieval. So can I expect 5,500 req/sec on a1/ or alternatively 5,500 req/sec on a1/b- and 5,500 on a1/c-?

So can someone explain what is meant by the announcement when it suggests a particular level of performance (e.g. +5,500 requests per second to retrieve data) for "each s3 prefix"?

John Rees
  • 233
  • 2
  • 5
  • I think I have an explanation for this, but am looking to see if I can find some confirmation. I suspect it has to do with the index partition split algorithm, which is automatic and based on traffic load... and lexical rather than hash based. – Michael - sqlbot Aug 07 '18 at 23:44

1 Answers1

12

What's actually being referred to here as a prefix appears to be an oversimplification that really refers to each partition of the bucket index. The index is lexical, so splits occur based on leading characters in the object key. Hence, it's referred to as the prefix.

S3 manages the index partitions automatically and transparently, so the precise definition of a "prefix" here is actually somewhat imprecise: it's "whatever S3 decides is needed to support your bucket's workload." S3 splits the index partitions in response to workload, so two objects that might have the same "prefix" today could have different prefixes tomorrow, all done in the background.

Right now, a1/a-... and a1/b-... and a1/c-... may be all a single prefix. But throw enough traffic at the bucket, and S3 may decide the partition should be split, so that tomorrow, a1/a- and a1/b- may be in one prefix, while a1/c- may be in its own prefix. (That is, keys < a1/c- are in one partition, while keys >= a1/c- are now in a different partition).

Where and when and specifically what threshold triggers the split behavior isn't documented, but it appears to be related only to the number of requests, and not the number or size of the objects. Previously, these partitions were limited to a few hundred requests per second each, and that's been significantly increased.

Michael - sqlbot
  • 22,658
  • 2
  • 63
  • 86
  • 1
    Very interesting and believable. However since the prefixes are dynamic based on load, surely that makes it meaningless to assign any specific performance measure “per prefix”. If your bucket’s prefixes change dynamically, then there is no reliable performance measure. Or perhaps I could deduce that prefixes should in theory change dynamically until I can expect 5,500 req/sec per S3 Object? – John Rees Aug 08 '18 at 03:08
  • 1
    The performance measure is still useful because bucket scaling only tends to go in one direction -- up, not down. The apparent absurdity of scaling to a single object per partition largely seems to disappear when you realize how much money AWS would be making if you were paying for 5k+ req/s per object. – Michael - sqlbot Aug 08 '18 at 10:19
  • 1
    Yes I was being a bit pedantic with the single object per partition. :-) However, more seriously, I guess this means that I could expect that if my 10000 object bucket contains just 10 popular objects, then hopefully S3 would eventually repartition until each of the 10 could get 5k reqs/sec each while the others languish in a couple of large partitions. Plausible? – John Rees Aug 08 '18 at 10:43
  • 3
    I have every confidence that S3 would adapt to the workload, yes. Official guidance for high traffic on the request side is, as before, to use CloudFront in conjunction with S3, since CloudFront is gobally-distributed and will cache the objects in the edges nearest the viewers that request them. The pricing is such that adding CloudFront to S3 often has essentially no impact on overall cost (because S3 doesn't bill for any bandwidth when the request arrives from CloudFront to service a cache miss). – Michael - sqlbot Aug 08 '18 at 12:13
  • Thanks Michael. Really good careful answers much appreciated. – John Rees Aug 08 '18 at 19:04