-2

I have access_key, access_id for both of the aws bucket belong to a different account. I have to copy data from one location to another, is there a way to do it faster.

I have tried map-reduced-based distcp that does not provide satisfactory performance.

lifeisshubh
  • 513
  • 1
  • 5
  • 27
  • what does map reduce have to do with this? There is nothing to map / reduce here. In what way does the performance not satisfy? If you have two different sets of credentials there is nothing other than downloading with first credentials and then uploading with the other ones. If you control either of the buckets, you can enable the other account to actually read / write to your bucket so that one set of credentials can access both buckets at the same time, then you can do "normal" s3 copy operations. – luk2302 Feb 10 '21 at 16:37
  • [distCP](https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html) can do a distributed copy on Hadoop based Clusters, there's also a specific AWS version called [s3DistCP](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html) which might conceivably be used here. – Maurice Feb 10 '21 at 16:47

1 Answers1

2

The best way to copy data between Amazon S3 buckets in different accounts is to use a single set of credentials that has permission to read from the source bucket and write to the destination bucket.

You can then use these credentials with the CopyObject() command, which will copy the object between the S3 buckets without the need to download and upload the objects. The copy will be fully managed by the Amazon S3 service, even if the buckets are in different accounts and even different regions. The copy will not involve transferring any data to/from your own computer.

If you use the AWS CLI aws s3 cp --recusive or aws s3 sync commands, the copies will be performed in parallel, making very fast copies of the objects.

There are two ways to perform a copy:

Push

  • Use a set of credentials from the Source account that has permission to read from the source bucket
  • Add a Bucket Policy on the destination bucket that permits Write access for these credentials
  • When performing the copy, use ACL=bucket-owner-full-control to assign ownership of the object to the destination account

OR

Pull

  • Use a set of credentials from the Destination account that has permission to write to the destination bucket
  • Add a Bucket Policy on the source bucket that permits Read access for these credentials
  • (No ACL is required because "pulling" the file will automatically give ownership to the account issuing the command)
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470