0

Every day we get one incremental file, and we have multiple sources from which we gets incremental files. And both will place these files in two different s3 prefixes. But they come in different time. We want to process both the files in one go and generate a report out of that. For this I will be using AWS Lambda and Data Pipeline. We will trigger AWS Data pipe line through Lambda. And Lambda will be triggered whenever a new file arrived.

We are able to the same when we have single source, so we created a s3 trigger ever for lambda and when ever the file comes, it is getting triggered and starting pipe line and emr activity is getting and at the end the report is getting generated.

Now we have the second source as well, and now we want to start the activity whenever both the files are arrived/uploaded.

Not sure if we can trigger aws lambda with more than one dependency. I know this can be done through Step Functions, i might go to that route if we dont have support for triggering lambda with multiple dependencies.

Trigger AWS Lambda function whenever new files arrived on two different s3 prefixes. Dont trigger lambda function if a file arrived on only s3 location but not on other location.

Krish
  • 135
  • 1
  • 3
  • 11
  • When you say "prefix", do you mean "bucket"? Because a prefix is an actual concept in S3, i.e. part of the file name. So what you're asking at the moment is, trigger a lambda when a file arrives in s3, and of course, your lambda will be getting triggered whenever a file with any prefix is written to your bucket. – 404 Nov 04 '19 at 15:39
  • And if you really do mean prefixes, then you should just make your lambda look in both locations to see if both files are there. If you have only one file, do nothing. If you have both, which will be the time that the lambda was triggered for the second time, then do your processing. This actually applies if you're talking about buckets as well - check both buckets, do your processing only if file is found in both. – 404 Nov 04 '19 at 15:41
  • I am looking for the second option, but there will be an issue here, lets say there are three sources, when ever file arrived from first source we will check whether todays files are there in all three prefixes/locations, so we know we have file from only one location, so nothing to do. Now files came from second and third at the same time, and lambda functions gets invoked, and now two invocations here, and both checks for all the three locations, and we have files available now, and lambda invokes the actual logic twice, so it will be an issue, do you agree? – Krish Nov 04 '19 at 17:23

0 Answers0