I am using a c5.18xlarge instance with the ENA adapter enabled (so expect to have 25 Gbps connectivity to S3 per AWS support). I am using the AWS C++ SDK (version 1.3.59) on RHEL 7 to upload a 70 GB file to a single S3 object using a 256 MB part size. Per AWS support, I've set the ClientConfiguration's maxConnections field to 999 and its executor field to use a PooledThreadExecutor with a pool size of 999 (and these have improved my performance). I am performing a series of S3Client::UploadPart() calls, threading these myself; I get very similar performance when using UploadPartCallable() and letting the SDK manage the threading.
Here's the performance I'm seeing: - 36 threads: 7.5 Gbps - 200 threads: 15.7 Gbps
AWS support reported similar behavior (actually they were using 900 threads).
I've looked through the underlying implementation of S3Client and all the low level thread management and curl handle management. I don't see anything obviously inefficient going on. It just doesn't make any sense to me that I would need 200 threads to achieve this performance on a machine that has 36 physical cores. Is this expected? Could someone provide an explanation for what's happening or a different way to configure the SDK to not require this many threads? I think I could provide my own HTTPClientFactory and customize things to cut out a mutex in how the curl handles are managed if I'm careful, but this seems unlikely to account for what I'm seeing.
Thanks for any help.
-Adam