AWS storage slow simultaneous reads

Question

We are finding that using AWS file storage (EFS or EBS using GP2 or GP3) from an EC2 instance is very slow when doing simultaneous reads. Here's an example:

I'm reading 30 binary files into memory, totaling 46 MB.

Doing this once takes about 16 ms. However, if I spawn 8 parallel processes on the same EC2 instance, each reading different sets of 30 binary files, each one takes an average of 105 ms (556% slower than a single process). It's almost like the 8 reads are happening serially instead of in parallel (though not quite). Note: There is no writing happening to these files at the time.

If I repeat the same test on my laptop, using local file storage, the same 8 simultaneous reads of the same files are each only about 70% slower than a single read.

Why is the performance hit of simultaneous reads of the same file so much greater using AWS storage?

Is there anything I can configure about the volume that would reduce that performance penalty?

Update: This does not seem to be dependent on reading the same files. I get the same performance whether each process is reading the same 30 files or 30 different files. Title and details updated to account for this.

Interesting question. Not sure the answer, but I wonder if you could look into disk caching, do the first read, the subsequent reads should come from the RAM cache and be near instant. I wonder if it's due to disk being across the network. 105ms still seems fairly quick, is it being so slow causing a problem? — Tim, Jan 20 '23 at 18:46
@Tim This is not the actual use case. I just simplified it to demonstrate the issue. The actual use case is more involved, and getting the actual data needed out of the 8 files takes about 360ms one at at time, and an average of 2.5 seconds each when 8 are done at once. This is indeed a problem at scale. The issue with caching is that (in this example) the file set totals 46 MB, and there may be many such sets of files needed at a time, which would be a lot to cache in memory, so keeping them only on disk is ideal. — JoeMjr2, Jan 20 '23 at 19:20
Maybe you could work around it somehow - one thread starts, downloads the files, then makes them available locally. Hopefully someone can help answer your question. — Tim, Jan 20 '23 at 20:26
Have you tested with a larger EC2 instance type, striping the data across multiple EBS volumes or a larger EBS volume? EBS performance is a function of network capacity and I'm guessing EFS as well. If the files are large enough you might be hitting the limit of the EC2 instance or a single EBS volume. As an example, we were able to increase DB performance by creating a RAID 0 array across two EBS volumes coupled with the larger instance we used for the DB. For smaller instances we did not see the same gains. — Tim P, Jan 20 '23 at 20:32

score 0 · Accepted Answer · answered Jan 23 '23 at 17:44

It turns out that this performance hit was due to a CPU bottleneck on the client. I was trying to read the file with 8 simultaneous processes, but the Docker container I was running it in was limited to only 2 cores. When I upped this to at least 8 cores, the performance went up considerably.

AWS storage slow simultaneous reads

1 Answers1