5

Is there a way to run a Lambda on every DynamoDb table record?

I have a Dynamo table with name, last name, email and a Lambda that takes name, last name, email as parameters. I am trying to configure the environment such that, every day, the Lambda runs automatically for every value it finds within Dynamo; can't do all the records in one Lambda as it won't scale (will timeout once more users are added).

I currently have a CloudWatch rule set up that triggers the lambda on schedule but I had to manually add the parameters to the trigger from Dynamo - It's not automatic and not dynamic/not connected to dynamo.

--

Another option would be to run a lambda every time a DynamoDb record is updated... I could update all the records weekly and then upon updating them the Lambda would be triggered but I don't know if that's possible either.

Some more insight on either one of these approaches would be appreciated!

Eduardo Vargas
  • 113
  • 1
  • 8

3 Answers3

3

Is there a way to run a Lambda on every DynamoDb table record?

For your specific case where all you want to do is process each row of a DynamoDB table in a scalable fashion, I'd try going with a Lambda -> SQS -> Lambdas fanout like this:

  1. Set up a CloudWatch Events Rule that triggers on a schedule. Have this trigger a dispatch Lambda function.

  2. The dispatch Lambda function's job is to read all of the entries in your DynamoDB table and write messages to a jobs SQS queue, one per DynamoDB item.

  3. Create a worker Lambda function that does whatever you want it to do with any given item from your DynamoDB table.

  4. Connect the worker Lambda to the jobs SQS queue so that an instance of it will dispatch whenever something is put on the queue.

Gabe Hollombe
  • 7,847
  • 4
  • 39
  • 44
  • Found this great tutorial on how to do just that, thanks for the tip! https://www.youtube.com/watch?v=lQvTubduQwQ – Eduardo Vargas Jan 29 '19 at 20:23
  • 1
    Step #2 ; when I dispatch a lambda to scan all the entries in the DDB, I'm not able to complete it. Since the limitation is 15 minutes, I'm not sure how to go about that. My DDB is 200GB – dashuser Jan 07 '20 at 19:39
  • Check out Lambda Destinations -- a new feature that lets you invoke a lambda in an async fashion, telling it what to do when it finishes. Have your dispatch lambda request a page of items from your DDB table via the scan operation, toss all items in that result page into SQS, then invoke another lambda, via Lambda Destinations, to process the next page of results (by passing the pagination token returned from the first page of results). More info on destinations here https://aws.amazon.com/blogs/compute/introducing-aws-lambda-destinations/ – Gabe Hollombe Jan 10 '20 at 06:56
  • Hi, can you have a look here please? https://stackoverflow.com/questions/70020731/lambda-runs-multiple-times-with-dynamodb-trigger –  Nov 18 '21 at 13:30
2

Since the limiting factor is lambda timeouts, run multiple lambdas using step functions. Perform a paginated scan of the table; each lambda will return the LastEvaluatedKey and pass it to the next invocation for the next page.

bwest
  • 9,182
  • 3
  • 28
  • 58
  • 1
    To improve concurrency you can also do a [parallel scan](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.ParallelScan) with multiple lambdas starting at different segments. – Raniz Jan 29 '19 at 06:59
  • Hi, can you have a look here please? https://stackoverflow.com/questions/70020731/lambda-runs-multiple-times-with-dynamodb-trigger –  Nov 18 '21 at 13:30
1

I think your best option is, just as you pointed out, to run a Lambda every time a DynamoDB record is updated. This is possible thanks to DynamoDB streams.

Streams are a ordered record of changes that happen to a table. These can invoke a Lambda, so it's automatic (however beware that the change appears only once in the stream, set up a DLQ in case your Lambda fails). This approach scales well and is also pretty evolvable. If need be, you can either push the events from the stream to an SQS or Kinesis, fan out, etc., depending on the requirements.

Milan Cermak
  • 7,476
  • 3
  • 44
  • 59
  • I don’t think DLQ is not necessary, AWS Lambda handles DynamoDb streams like Kinesis streams: processing will stop when an error is thrown for a record in the stream. Everything is retried until is successful or expired. https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html – Mario Feb 08 '19 at 23:23
  • Hi, can you have a look here please? https://stackoverflow.com/questions/70020731/lambda-runs-multiple-times-with-dynamodb-trigger –  Nov 18 '21 at 13:30