I have a use case where new files can show up in a s3 folder at any time and we would like to import them in Redshift via the RedshiftCopyActivity. I have a pipeline setup where we can move data from s3 to Redshift - but with files that are specifically named. However, in this case the file names can be random. I am thinking of something like
- Say we have a s3 folder s3://toProcess
- Every hour the data pipeline job checks if there are new files in s3://toProcess
- If there are then these are processed and deleted (so it doesn't process them in next hour)
Any thoughts on how to get this done?