AWS Datapipeline - import randomly named files in a s3 bucket to Redshift

Question

I have a use case where new files can show up in a s3 folder at any time and we would like to import them in Redshift via the RedshiftCopyActivity. I have a pipeline setup where we can move data from s3 to Redshift - but with files that are specifically named. However, in this case the file names can be random. I am thinking of something like

Say we have a s3 folder s3://toProcess
Every hour the data pipeline job checks if there are new files in s3://toProcess
If there are then these are processed and deleted (so it doesn't process them in next hour)

Any thoughts on how to get this done?

score 0 · Answer 1 · edited May 23 '17 at 11:44

0

When creating a new AWS Data Pipeline there is an option to use a predefined template. For what you need the Load Data from S3 Into Redshift Template should get you most of the way there. You will need to add an Activity that looks something like what is described here to delete those files.

edited May 23 '17 at 11:44

Community

1
1

answered Jun 29 '16 at 19:52

JustinDoesWork

508
1
5
15

AWS Datapipeline - import randomly named files in a s3 bucket to Redshift

1 Answers1