3

I have ~1.5PB data in S3 us-west-1. I want to copy this to us-east-2 region. Should I use cross-region replication or S3 Sync? And, what are the pros and cons of using the two options?

I researched a few AWS threads and found that they describe each one in great detail (E.g. https://aws.amazon.com/premiumsupport/knowledge-center/s3-large-transfer-between-buckets/ and https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-migrate-region/), without explaining the difference between the two.

Please note that our security policies don't allow Snowball Edge.

Can someone please help me?

awsuser2021
  • 115
  • 3
  • 10

2 Answers2

4

When you adds the replication to your the bucket then the Objects that existed before will not be copied to the other bucket. Replication will also not let you replicate if Objects created with server-side encryption using customer-provided (SSE-C) encryption keys. for more detail you should read this.

So in this case, either you can use the AWS S3 Sync or AWCCLi's cp command (will be slower) or use Snowball Edge (Which you can do't do as per the description)

aws s3 cp --recursive s3://<bucket>>
aws s3 sync s3://<bucket> s3://<bucket>>

AWS Sync is good for small size objects/buckets but as you mentioned you have peta bytes of data then I will provide you two solutions:

  1. S3 batch Operations: You can use Amazon S3 batch operations to copy multiple objects with a single request.
  2. S3DistCp: The S3DistCp operation on Amazon EMR can perform parallel copying of large volumes of objects across Amazon S3 buckets. More Read

Once you have copied your data to another S3 bucket you can enable the replication which will replicate all new objects.

Notes: These solutions can be expensive, so make sure you read about the cost if using these operations.

KayD
  • 746
  • 5
  • 15
  • Thanks KayD. Also, can we copy so much data using S3 console? Do you know? I'd appreciate your thoughts. – awsuser2021 Oct 09 '20 at 20:37
  • @awsuser2021 Snowball Edge has limitations to 83TB Usable Storage. You can transfer extremely large amounts of data to AWS up to 100PB per Snowmobile. You can consider using AWS Snowmobile for other options but in your case both options aren't for you because your data is already in the AWS. Yes, using S3 Bucket from console you can copy to other buckets which is similar to `aws cp` from awscli, I don't recommend. Using aws sync will be better option instead of copy. – KayD Oct 09 '20 at 20:49
0

Replication will copy newly PUT objects into the destination bucket.

Sync will copy existing objects to the destination bucket.

Generally you would enable replication and then run sync once to copy the existing objects.

Greg
  • 23,155
  • 11
  • 57
  • 79
  • I wonder if we overwrite objects. And then we run sync. How would it catch which version is the latest? I know it has version id, but S3 is smart enough to know the new version is the one that was through replication instead of sync? – Bao Thai Nov 19 '20 at 23:31
  • Check the documentation here. https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/sync.html `The default behavior is to ignore same-sized items unless the local version is newer than the S3 version.` – Greg Nov 20 '20 at 16:27