An automatic way to copy data from one S3 to another S3 bucket

Question

What is the best way to continuously copy data from one s3 to another s3 bucket? I understand that S3 supports event notification and send them to lambda, SNS and SQS, but I am unsure what would be the option to do? should I trigger a lambda function to receive the records from S3 and copy them into another s3? should I use SNS or SQS to do that?

Do you mean "copy from one S3 bucket to another S3 bucket"? If the buckets are in different regions, you can use [Cross-Region Replication](https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html). If they are in the same region, you would need to code it yourself. — John Rotenstein, Nov 02 '18 at 01:15
thanks, I know I need to code it by myself, but not sure how to do it? should I trigger S3 to send the objects to lambda function when any object is created and then send it to the destination (another s3) https://docs.aws.amazon.com/AmazonS3/latest/user-guide/enable-event-notifications.html or https://medium.com/@stephinmon.antony/aws-lambda-with-python-examples-2eb227f5fafe — milad ahmadi, Nov 02 '18 at 03:32
Does this answer your question? [Fastest way to sync two Amazon S3 buckets](https://stackoverflow.com/questions/39149171/fastest-way-to-sync-two-amazon-s3-buckets) — Channa, Sep 12 '20 at 06:32

score 4 · Answer 1 · answered Feb 03 '20 at 12:45

I would recommend to use S3 Replication. You can replicate objects between different AWS Regions or within the same AWS Region.

Cross-Region replication (CRR) is used to copy objects across Amazon S3 buckets in different AWS Regions.

Same-Region replication (SRR) is used to copy objects across Amazon S3 buckets in the same AWS Region.

As described here: https://docs.aws.amazon.com/AmazonS3/latest/dev/replication.html

You just need to go to Management -> Replication in your S3 bucket and configrue the source and destination bucket. Moreover, you can use encryption:

Why Use Replication?

Replication can help you do the following:

Replicate objects while retaining metadata — You can use replication to make copies of your objects that retain all metadata, such as the original object creation time and version IDs. This capability is important if you need to ensure that your replica is identical to the source object

Replicate objects into different storage classes — You can use replication to directly put objects into Glacier, DEEP ARCHIVE, or another storage class in the destination bucket. You can also replicate your data to the same storage class and use lifecycle policies on the destination bucket to move your objects to a colder storage class as it ages.

Maintain object copies under different ownership — Regardless of who owns the source object, you can tell Amazon S3 to change replica ownership to the AWS account that owns the destination bucket. This is referred to as the owner override option. You can use this option to restrict access to object replicas.

Replicate objects within 15 minutes — You can use S3 Replication Time Control (S3 RTC) to replicate your data in the same AWS Region or across different Regions in a predictable time frame. S3 RTC replicates 99.99 percent of new objects stored in Amazon S3 within 15 minutes (backed by a service level agreement). For more information, see Replicating Objects Using S3 Replication Time Control (S3 RTC).

When to Use CRR?

Cross-Region replication can help you do the following:

Meet compliance requirements — Although Amazon S3 stores your data across multiple geographically distant Availability Zones by default, compliance requirements might dictate that you store data at even greater distances. Cross-Region replication allows you to replicate data between distant AWS Regions to satisfy these requirements.

Minimize latency — If your customers are in two geographic locations, you can minimize latency in accessing objects by maintaining object copies in AWS Regions that are geographically closer to your users.

Increase operational efficiency — If you have compute clusters in two different AWS Regions that analyze the same set of objects, you might choose to maintain object copies in those Regions.

When to Use SRR?

Same-Region replication can help you do the following:

Aggregate logs into a single bucket — If you  store logs in multiple buckets or across multiple accounts, you can easily replicate logs  into a single, in-Region bucket. This allows for simpler processing of logs in a single  location.

Configure live replication between production and test  accounts — If you or your customers have production and test accounts  that use the same data, you can replicate objects between those multiple accounts, while maintaining  object metadata, by implementing SRR rules.

Abide by data sovereignty laws — You might be required to store multiple copies of your data in separate AWS accounts within a certain Region. Same-Region replication can help you automatically replicate critical data when compliance regulations don't allow the data to leave your  country.

score 3 · Accepted Answer · answered Nov 02 '18 at 03:37

3

Assuming that both your buckets are in the same region (otherwise you could use Cross-Region Replication), the process would be:

Create an AWS Lambda function
Configure S3 events on the bucket to trigger the Lambda function when an object is created

The Lambda function will be passed details of the bucket and object. It should then copy the object to the other bucket.

There is no need to involve Amazon SNS nor Amazon SQS.

answered Nov 02 '18 at 03:37

John Rotenstein

241,921
22
380
470

yes, that makes sense, but these two S3 are located in separate VPC, so where should I place the Lambda function? should it be created on the source or destination? and does this solution work when two buckets are located in separate VPC? – milad ahmadi Nov 02 '18 at 03:47
Amazon S3 exists outside of VPCs. It can be accessed from anywhere on the Internet. Do not configure the Lambda function to use a VPC. This will directly connect it to the Internet instead, which will allow it to communicate with Amazon S3. – John Rotenstein Nov 02 '18 at 04:05
Should I request the people who working on source , to add notification and link it to Lambda as it is explained here : https://docs.aws.amazon.com/AmazonS3/latest/user-guide/enable-event-notifications.html number 7,8,9 Or sould I create a Lambda function in my VPC and link to s3 (source) ? so anything change happens, the function will be invoked – milad ahmadi Nov 02 '18 at 04:29
You can either configure the S3 bucket to trigger the Lambda function (7, 8, 9), or when creating the Lambda function you can specify the S3 bucket as a Trigger. Both methods produce the same configuration. Please note: You are _not_ creating the Lambda function in a VPC -- leave the VPC option blank in the Lambda function. – John Rotenstein Nov 02 '18 at 04:37
Can I ask you another question please? well, I am going to implement this solution for a bank, so they have always security concerns as lambda function is serverless, as you know the data is encrypted in the source and destination, so my question is that is the transit channel is also encrypted? or it is better to ask how lambda function copy data from s3 to another s3? is it secured enough? – milad ahmadi Nov 07 '18 at 03:09
You would be best having your organization's security people meet with AWS to discuss security considerations. – John Rotenstein Nov 07 '18 at 11:37

An automatic way to copy data from one S3 to another S3 bucket

2 Answers2

Why Use Replication?

When to Use CRR?

When to Use SRR?