File Copy from one S3 bucket to other S3 bucket using Lambda - timing constraint?

Question

I need to copy large files ( may be even greater than 50 GB) from one S3 bucket to other S3 bucket ( event based). I am planning to use s3.Object.copy_from to do this inside Lambda ( using boto3).

I wanted to see if anyone has tried this? will this have any performance issue for larger files (100 GB etc.) causing Lambda timeout?

If yes, is there any alternate option ? ( I am trying to use code since I might need to do some other additional logic like rename file, move source file to archive etc.).

Note- I am also exploring AWS S3 Replication options, but looking for other solutions in parallel.

score 0 · Answer 1 · answered Jan 29 '23 at 18:51

0

You can use AWS S3 replication feature. It supports key prefix and API filtering as well.

answered Jan 29 '23 at 18:51

Mehmet Güngören

2,383
1
9
16

Thanks @Mehmet. I am already exploring S3 Replication option as well, in parallel also looking for coding options to implement this feature. – PythonDeveloper Jan 30 '23 at 00:07
Can I rename the file in destination bucket during replication? – PythonDeveloper Jan 30 '23 at 04:33

score 0 · Answer 2 · answered Apr 25 '23 at 09:07

Use glue job to copy the file from one bucket to another bucket as you have time limitation

import boto3
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame

## Get the source and destination bucket names and file paths
args = getResolvedOptions(sys.argv, ['source_bucket', 'source_key', 'destination_bucket', 'destination_key'])

source_bucket = args['source_bucket']
source_key = args['source_key']
destination_bucket = args['destination_bucket']
destination_key = args['destination_key']

## Create a Glue context and set up a Boto3 client for S3
glueContext = GlueContext(SparkContext.getOrCreate())
s3 = boto3.resource('s3')

## Get the source S3 object and read the data as bytes
source_obj = s3.Object(source_bucket, source_key)
source_data = source_obj.get()['Body'].read()

## Create a destination S3 object and write the data
s3.Object(destination_bucket, destination_key).put(Body=source_data)

File Copy from one S3 bucket to other S3 bucket using Lambda - timing constraint?

2 Answers2