-2

I need to copy large files ( may be even greater than 50 GB) from one S3 bucket to other S3 bucket ( event based). I am planning to use s3.Object.copy_from to do this inside Lambda ( using boto3).

I wanted to see if anyone has tried this? will this have any performance issue for larger files (100 GB etc.) causing Lambda timeout?

If yes, is there any alternate option ? ( I am trying to use code since I might need to do some other additional logic like rename file, move source file to archive etc.).

Note- I am also exploring AWS S3 Replication options, but looking for other solutions in parallel.

PythonDeveloper
  • 289
  • 1
  • 4
  • 24

2 Answers2

0

You can use AWS S3 replication feature. It supports key prefix and API filtering as well.

Mehmet Güngören
  • 2,383
  • 1
  • 9
  • 16
0

Use glue job to copy the file from one bucket to another bucket as you have time limitation

import boto3
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame

## Get the source and destination bucket names and file paths
args = getResolvedOptions(sys.argv, ['source_bucket', 'source_key', 'destination_bucket', 'destination_key'])

source_bucket = args['source_bucket']
source_key = args['source_key']
destination_bucket = args['destination_bucket']
destination_key = args['destination_key']

## Create a Glue context and set up a Boto3 client for S3
glueContext = GlueContext(SparkContext.getOrCreate())
s3 = boto3.resource('s3')

## Get the source S3 object and read the data as bytes
source_obj = s3.Object(source_bucket, source_key)
source_data = source_obj.get()['Body'].read()

## Create a destination S3 object and write the data
s3.Object(destination_bucket, destination_key).put(Body=source_data)