0

I have a bucket bucket_a and trying to copy the content of this bucket into another bucket bucket_b but it is taking huge time to copy the objects from a folder folder_a of bucket bucket_a to folder_a of bucket bucket_b as there are nearly 9k objects in each folder with size 600MB each and I have 20 folders like that.

I tried accelerate option of bucket and used cp command of aws cli and it looks like it will take another 2-3 days to copy the contents.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
VIPIN KUMAR
  • 3,019
  • 1
  • 23
  • 34
  • did you try aws sync command? – dassum May 14 '19 at 15:38
  • aws s3 sync s3://from_my_bucket s3://to_my_other_bucket – dassum May 14 '19 at 15:45
  • Are the buckets in the same region? – John Rotenstein May 14 '19 at 23:49
  • You could run the `sync` command for each folder simultaneously (in separate CLI sessions) so that the folders copy in parallel. Also, running the `sync` command from an Amazon EC2 instance in the same region might speed it up a bit. (While the data moves directly between buckets without being downloaded, the API calls might go faster.) – John Rotenstein May 14 '19 at 23:52
  • @JohnRotenstein - I did a testing on 30GB folder with 20 files in it using `sync` and `cp` and didn't find any difference in file transfer and buckets are in same region. – VIPIN KUMAR May 16 '19 at 12:30
  • @dassum - I tried the `sync` command the way you mentioned but got same result like `cp` – VIPIN KUMAR May 16 '19 at 13:15
  • @VIPINKUMAR Did you try running several `sync` commands at the same time (in separate Terminal windows), with each one doing a different folder? – John Rotenstein May 16 '19 at 21:53

1 Answers1

0

One way to speed up this kind of transfer is to ensure the role you are using for the transfer is allowed to read the source bucket and write the destination bucket. When the role and bucket policies are setup correctly you can prevent the data from traversing outside of AWS servers. This enables much faster transfer times and saves your own data usage.

See: https://aws.amazon.com/premiumsupport/knowledge-center/copy-s3-objects-account/ and https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html

Another way to speed up the transfer is to tweak some of the settings used during the copy. You can specify the chunk size, how many threads can be doing transfers simultaneously, and other settings. See: https://docs.aws.amazon.com/cli/latest/topic/s3-config.html

You probably want to use at least 4-10 max_concurrent_requests (number of threads) and increase the multipart_chunksize from the default to something based on your available memory and how many concurrent requests you will allow.

I found the aws cli tool is able to consume basically all of your CPU and memory, so you need to be careful how you set this up.