-1

The requirement is we store the documents in s3 buckets. Client needs backup in the specific interval so take it to the filesystem or s3 or azure or any other place and give it back to restore back sometimes. The versioning is enabled in the s3 bucket so we are fetching documents based on VersionId. If we need to restore back the documents which they give as a filesystem they don't have VersionId so in DB we can't map the documents.

How to copy all documents from one bucket to another bucket with renaming objects in the destination bucket as version id of the source bucket?

We tried it using AWS CLI, APIs and SDK but this doesn't work well for millions of objects. I got to know about S3 batch replication, so need to know how to do using S3 batch replication or any other way.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
San
  • 69
  • 1
  • 8
  • https://aws.amazon.com/blogs/storage/replicating-existing-objects-between-s3-buckets/ – brushtakopo Mar 29 '23 at 07:33
  • I need to rename the destination object with the old version ID @brushtakopo – San Mar 29 '23 at 08:11
  • Why not just use S3 Replication, which will copy all versions of an object to another bucket, very soon after the object is created/updated? – John Rotenstein Mar 29 '23 at 08:13
  • @JohnRotenstein some customer wants the backup in the filesystem, then we will lose the version IDs if we transfer file from s3 to files system.so we need to rename with version id so once we restore back we can update our DB to map the correct documents. – San Mar 29 '23 at 10:57

1 Answers1

0

I have used Batch Operation which invokes the lambda function, I have created a manifest file of documents that need to be copied, the manifest file is a CSV file containing bucket name, object key, versionID. The Lambda function which copies the document to destination bucket with name change as version ID.

Lambda function is as follows

import json
import boto3

s3 = boto3.client('s3')

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))
    # Parse the S3 Batch Operations event
    invocation_schema_version = event['invocationSchemaVersion']
    invocation_id = event['invocationId']
    treatMissingKeysAs = "PermanentFailure"
    resultss = []
    print(event['tasks'])
    for task in event['tasks']:
        task_id = task['taskId']
        src_bucket = task['s3BucketArn'].split(':')[-1]
        src_key = task['s3Key']
        src_version_id = task['s3VersionId']

        try:
            # Copy the object to the destination bucket, renaming it with the source version ID
            dest_key = f"{src_version_id}"
            s3.copy_object(
                CopySource={'Bucket': src_bucket, 'Key': src_key, 'VersionId': src_version_id},
                Bucket=<destination-bucketname>,
                Key=dest_key
            )
            result_code = 'Succeeded'
        except Exception as e:
            result_code = 'Failed'

        # Add result to the list of results
        resultss.append({
            'taskId': task_id,
            'resultCode': result_code,
            'resultString': src_key
        })
    
    return {
        'invocationSchemaVersion': invocation_schema_version,
        'invocationId': invocation_id,
        'treatMissingKeysAs': treatMissingKeysAs,
        'results': resultss
    }

    
    
San
  • 69
  • 1
  • 8