0

I have a use case where new files can show up in a s3 folder at any time and we would like to import them in Redshift via the RedshiftCopyActivity. I have a pipeline setup where we can move data from s3 to Redshift - but with files that are specifically named. However, in this case the file names can be random. I am thinking of something like

  • Say we have a s3 folder s3://toProcess
  • Every hour the data pipeline job checks if there are new files in s3://toProcess
  • If there are then these are processed and deleted (so it doesn't process them in next hour)

Any thoughts on how to get this done?

sumit
  • 436
  • 3
  • 15

1 Answers1

0

When creating a new AWS Data Pipeline there is an option to use a predefined template. For what you need the Load Data from S3 Into Redshift Template should get you most of the way there. You will need to add an Activity that looks something like what is described here to delete those files.

Community
  • 1
  • 1
JustinDoesWork
  • 508
  • 1
  • 5
  • 15