3

I've built a bit of a pipeline of AWS Lambda functions using the Serverless framework. There are currently five steps/functions, and I need them to run in order and each run exactly once. Roughly, the functions are:

  1. Trigger function by an HTTP request, respond with an ID.
  2. Access and API to get the URL of a resource to download.
  3. Download that resource and upload a copy to S3.
  4. Alter that resource and upload the altered copy to S3.
  5. Submit the altered resource to a different API.

The specifics aren't important, but the question is: What's the best event/trigger to use to move along down this line of functions? The first one is triggered by an HTTP call, but the first one needs to trigger the second somehow, then the second triggers the third, and so on.

I wrote all the code using AWS SNS, but now that I've deployed it to staging I see that SNS often triggers more than once. I could add a bunch of code to detect this, but I'd rather not. And the problem is also compounding -- if the second function gets triggered twice, it sends two SNS notifications to trigger step three. If either of those notifications gets doubled... it's not unreasonable that the last function could be called ten times instead of once.

So what's my best option here? Trigger the chain through HTTP? Kinesis maybe? I have never worked with a trigger other than HTTP or SNS, so I'm not really sure what my options are, and which options are guaranteed to only trigger the function once.

fnsjdnfksjdb
  • 1,653
  • 5
  • 19
  • 33

4 Answers4

0

AWS Step Functions seems pretty well targeted at this use-case of tying together separate AWS operations into a coherent workflow with well-defined error handling.

Not sure if the pricing will work for you (can be pricey for millions+ operations) but it may be worth looking at.

Also not sure about performance overhead or other limitations, so YMMV.

Justin Grant
  • 44,807
  • 15
  • 124
  • 208
  • I wonder if Step Functions guarantees exactly-once invocation of Lambda tasks? Couldn't find any mention in the docs myself. – Aleksi Sep 05 '19 at 06:39
  • 1
    My understanding of Step Functions is that the individual task steps in Step Functions are called with exactly once semantics (see https://forums.aws.amazon.com/thread.jspa?threadID=270231#jive-message-843916) but it's up to your task implementation to ensure that you're calling your Lambda exactly once. – Justin Grant Sep 07 '19 at 19:58
0

You can simply trigger the next lambda asynchronously in your lambda function after you complete the required processing in that step.

So, the first lambda is triggered by an HTTP call and in that lambda execution, after you finish processing this step, just launch the next lambda function asynchronously instead of sending the trigger through SNS or Kinesis. Repeat this process in each of your steps. This would guarantee single time execution of all the steps by lambda.

Suraj
  • 602
  • 3
  • 16
  • Are you suggesting triggering every function through HTTP, but just sending the request and not waiting for a response? Or do you mean something else by "launch the next lambda"? Wondering if there's a way to launch a lambda that I don't know about. – fnsjdnfksjdb Sep 06 '19 at 21:25
  • Basically, I was suggesting you to launch your lambda functions asynchronously(cannot wait for response when launched asynchronously) through code using AWS SDK. Your first function is triggered by an HTTP request and in that function, you can add code to launch your second lambda function asynchronously. Modify your lambda functions by adding code using AWS SDK such that they launch the next lambda function in your pipeline asynchronously. – Suraj Sep 08 '19 at 17:39
0

Eventful Lambda triggers (SNS, S3, CloudWatch, ...) generally guarantee at-least-once invocation, not exactly-once. As you noted you'd have to handle deduplication manually by, for example, keeping track of event IDs in DynamoDB (using strongly consistent reads!), or by implementing idempotent Lambdas, meaning functions that have no additional effects even when invoked several times with the same input. In your example step 4 is essentially idempotent providing that the function doesn't have any side effects apart from storing the altered copy, and that the new copy overwrites any previously stored copies with the same event ID.

One service that does guarantee exactly-once delivery out of the box is SQS FIFO. This service unfortunately cannot be used to trigger Lambdas directly so you'd have to set up a scheduled Lambda to poll the FIFO queue periodically (as per this answer). In your case you could handle step 5 with this arrangement, since I'm assuming you don't want to submit the same resource to the target API several times.

So in summary here's how I'd go about it:

  1. Lambda A, invoked via HTTP, responds with ID and proceeds to asynchronously fetch resource from the API and store it to S3
  2. Lambda B, invoked by S3 upload event, downloads the uploaded resource, alters it, stores the altered copy to S3 and finally pushes a message into the FIFO SQS queue using the altered resource's filename as the distinct deduplication ID
  3. Lambda C, invoked by CloudWatch scheduler, polls the FIFO SQS queue and upon a new message fetches the specified altered resource from S3 and submits it to the other API

With this arrangement even if Lambda B is occasionally executed twice or more by the same S3 upload event there's no harm done since the FIFO SQS queue handles deduplication for you before the flow reaches Lambda C.

Aleksi
  • 4,483
  • 33
  • 45
  • I find it very annoying that FIFO queues cannot trigger lambdas. This is basically exactly what I need. My process that would be triggered is computationally expensive and I see no justification for why I must allow multiple invocations when standard SQS queues can trigger lambdas – Balter Oct 31 '19 at 17:10
  • As an update, AWS [recently announced](https://aws.amazon.com/blogs/compute/new-for-aws-lambda-sqs-fifo-as-an-event-source/) support for SQS FIFO as Lambda event source. However, they don't guarantee exactly-once delivery when used as a Lambda trigger so the above workflow is still necessary AFAIK. – Aleksi Dec 06 '19 at 07:30
0

AWS Step function is meant for you: https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html

You will execute the steps you want based on previous executions outputs. Each task/step just need to output a json correctly in the wanted "state".

https://docs.aws.amazon.com/step-functions/latest/dg/concepts-states.html

Based on the state, your workflow will move on. You can create your workflow easily and trigger lambdas, or ECS tasks. ECS tasks are your own "lambda" environment, running without the constraints of the AWS Lambda environment.

With ECS tasks you can run on Bare metal, on your own EC2 machine, or in ECS Docker containers on ECS and thus have unlimited resources extensible limits. As compared to Lambda where the limits are pretty strict: 500Mb of disk, execution limited in time, etc.

koxon
  • 818
  • 1
  • 10
  • 12