Maximizing S3 upload performance with AWS C++ SDK

Question

I am using a c5.18xlarge instance with the ENA adapter enabled (so expect to have 25 Gbps connectivity to S3 per AWS support). I am using the AWS C++ SDK (version 1.3.59) on RHEL 7 to upload a 70 GB file to a single S3 object using a 256 MB part size. Per AWS support, I've set the ClientConfiguration's maxConnections field to 999 and its executor field to use a PooledThreadExecutor with a pool size of 999 (and these have improved my performance). I am performing a series of S3Client::UploadPart() calls, threading these myself; I get very similar performance when using UploadPartCallable() and letting the SDK manage the threading.

Here's the performance I'm seeing: - 36 threads: 7.5 Gbps - 200 threads: 15.7 Gbps

AWS support reported similar behavior (actually they were using 900 threads).

I've looked through the underlying implementation of S3Client and all the low level thread management and curl handle management. I don't see anything obviously inefficient going on. It just doesn't make any sense to me that I would need 200 threads to achieve this performance on a machine that has 36 physical cores. Is this expected? Could someone provide an explanation for what's happening or a different way to configure the SDK to not require this many threads? I think I could provide my own HTTPClientFactory and customize things to cut out a mutex in how the curl handles are managed if I'm careful, but this seems unlikely to account for what I'm seeing.

Thanks for any help.

-Adam

score 0 · Answer 1 · answered Jun 06 '19 at 18:25

0

I am using the AWS C++ SDK (version 1.3.59) on RHEL 7 to upload a 70 GB file to a single S3 object using a 256 MB part size.

You're probably being limited by your disk/storage device's read throughput. It's actually impressive that you're able to reach 15.7 Gbps.

answered Jun 06 '19 at 18:25

Marco M.

2,956
2
29
22

Ahh I should have clarified; for this testing, I've allocated a single large buffer and am uploading to S3 directly from that in-memory buffer, so there is no file I/O. Some combination of network, the backend S3 implementation, and the SDK must be limiting performance. It seems very surprising to me that so many threads are needed... – Adam Jun 12 '19 at 14:03
Every thread makes its own connection to S3. The more connections, the more you can transfer to S3. – Marco M. Jun 23 '19 at 20:22

score 0 · Answer 2 · answered Sep 16 '21 at 22:27

In my test, I see all threads created by Aws::Utils::Threading::PooledThreadExecutor are running in one single CPU core(while the spot instance has 72 vCPUs). Have you seen the same behavior in your tests?

The way I further improved the performance is by using my own threading model with S3Client blocking APIs instead of PooledThreadExecutor with S3 async methods(such as UploadPartAsync()).

Maximizing S3 upload performance with AWS C++ SDK

2 Answers2