0

I have to move files from S3 to GCS. The problem i have is on mondays they uploads files from monday but also of saturdays and sundays and this files have different dates. For example: stack_20220430.csv, stack_20220501.csv. I need to move this files in the same airflow run, is that posible? I'm using the S3ToGCSOperator:

S3ToGCSOperator(
        task_id="move_files_s3_to_gcs",
        bucket=config["s3_params"]["s3_source_bucket"],
        prefix=config["s3_params"]["s3_source_prefix"],
        delimiter="/",
        dest_gcs=config["gcs_params"]["gcs_destination"],
        aws_conn_id=config["s3_params"]["s3_connector_name"],
    )

Obviously the problem is that prefix takes a fixed value. I can assign a range for {{ds}}?

Gon Alb
  • 41
  • 6

1 Answers1

1

The S3ToGCSOperator copy/move all files in the bucket/key you provided. It does it by listing all of them and then iterate each file and copy it to GCS.

prefix is templated field so you can use {{ ds }} with it.

You can always inherit from S3ToGCSOperator and customize the behavior of the operator to your specific needs.

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49