2

I have a client who uploads x amount of files to an S3 bucket. Currently, I have a lambda that processes the files that go into that bucket- but it triggers upon each file being uploaded. The problem is, I have no idea how many files the client will upload at once- it could be one file, or it could be up to ten. I have some logic in the lambda that returns a different output depending on how many files have been uploaded.

I came across Want to upload multiple files to S3 and only when all are uploaded trigger a lambda function - Stack Overflow which I really like the sound of- however I am not sure how to set it up (or what the policies for multiple file uploads will be). I should be able to get my lambda to subscribe to a topic easy enough, but how do I notify an SNS topic that a batch (which can vary in number) of files have been uploaded in a single instance?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
clattenburg cake
  • 1,096
  • 3
  • 19
  • 40
  • How are you clients uploading the files to S3? Are they using the AWS CLI? – John Rotenstein Nov 30 '22 at 23:32
  • @JohnRotenstein Yup, they will be using the CLI (or perhaps a UI where they can drag and drop a file, which I will build). I haven't figured out how to expose a the console to a separate IAM user from another account yet – clattenburg cake Dec 01 '22 at 08:23
  • If they are using the AWS CLI then you can use it to send a message to an Amazon SNS topic, too. – John Rotenstein Dec 01 '22 at 09:23
  • @JohnRotenstein This might be a question for another thread- in general how do you differentiate between s3 event trigger upon upload, against an Amazon SNS topic? Can you trigger a message that "multiple files" have been uploaded, so the lambda can work on all those files? – clattenburg cake Dec 01 '22 at 11:20
  • 1
    Any program (including the AWS CLI) can make a [`Publish()`](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/sns/publish.html) API call to send a message to an Amazon SNS topic. You can put any message you want in the topic. So, send enough information so that the Lambda function knows what it should do (for example, which Bucket or Folder to process). You could actually `Invoke()` the Lambda function directly, but that's not great from a security/architecture perspective. – John Rotenstein Dec 01 '22 at 11:29

1 Answers1

2

This is a common question!

Basically, 'something' needs to be able to say "All the files are here, start processing them!"

Since you have a variable number of files arriving, merely counting the files will not be sufficient. Instead, you'll need something else to trigger the processing, such as:

  • The client providing a list of all files to be included (a 'manifest file'), or
  • The client performing some action to say "Done, ready for processing", or
  • Waiting a certain amount of time after the last upload (eg 10 minutes) and then assuming all files were provided

In the Question you linked, the client would be responsible for sending a message to an Amazon SNS topic to trigger processing. This could be achieved by giving them a script file that runs the AWS CLI. This would need IAM credentials, but presumably they have this already since they are uploading to S3?

Other signalling methods from the client could be:

  • After uploading the files, they upload one final file with a special name (eg no-more-files.txt), which means that processing should commence. The Lambda function could look for this name.
  • They go to a web page and click a button, which triggers the Lambda function.
  • They send an email to a special address, which triggers the Lambda function.
  • They double-click some program/script you have given them.
  • They delete a file in S3 -- for example, if they use Cyber Duck to upload files with a nice user interface, they could delete a "HOLD" file, which would then trigger the Lambda function.

Lots of ways, depending on how many clients you have, the technical skills of the client, whether you want to issue them with AWS credentials and how they upload the files.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470