0

I have a Glue job, it looks at the files for the current date (each date has a folder in S3) and process the data in this folder (e.g: "s3://bucket_name/year/month/day"), now I want to find a way to define the input s3 path which tells Glue to look at the previous day and current day, is there a way to do this?

current_glue_input_path = "s3://bucket_name/2021/08/12"

I want to find a regex expression (maybe a wildcard?) and tell Glue to look at "s3://bucket_name/2021/08/11" and "s3://bucket_name/2021/08/12", is there a way to do so?

From this documentation: under the 'Example of Excluding a Subset of Amazon S3 Partitions' section:

The second part, 2015/0[2-9]/**, excludes days in months 02 to 09, in year 2015.

Not sure if this makes sense, can someone help please? Thanks.

(I just realized that this documentation is the regex for Glue crawler, I'm talking about the Glue job, am I looking at the wrong place...?)

wawawa
  • 2,835
  • 6
  • 44
  • 105

1 Answers1

0

Would calculating current and previous date programmatically work? Python sample below -

from datetime import datetime, timedelta
date_today = datetime.today().strftime('%Y%m%d')
date_yesterday = datetime.strftime(datetime.now() - timedelta(1), '%Y%m%d')
current_glue_input_path = f's3://bucket_name/{date_today[0:4]}/{date_today[4:6]}/{date_today[6:8]}'
yesterday_glue_input_path = f's3://bucket_name/{date_yesterday[0:4]}/{date_yesterday[4:6]}/{date_yesterday[6:8]}'
Rohit P
  • 613
  • 5
  • 13