4

We are trying the lambda for our ETL job which is written in Clojure.

Our architecture is the scheduler will trigger the parent lambda, then the parent lambda trigger 100 child lambda and counter lambda. The child lambdas after completion of their work it will write the data to s3 . The counter lambda will check the number of files in the S3 , if it is 100 then it will combine all the files and save it to S3, otherwise it will span a new counter lambda and die.

All the positive scenario is working fine, but if any child fails then the counter lambda will end up in the indefinite loop, because there wont be 100 files.

If there any proper way of spanning child lambda, monitor it and if it fails need to restart or retry that alone ?

Is there any good Clojure lambda framework ?

SANN3
  • 9,459
  • 6
  • 61
  • 97

2 Answers2

2

Process monitoring is not built into any lambda clojure libraries that I know of, so for this case I'd recommend taking a page out of the erlang metaphorical play book (supervisor trees) and say that to have a dependable distributed system every actor needs a monitor so a decent approach would be to have a watcher for each lambda task. This can really simplify the error handling cases along the "let it crash" philosophy.

So this would leave you with this list of lambdas:

  • counters:
    • a watcher/restarter for the counter (you kind of already have this)
  • workers x100
  • supervisors x100

Each supervisor only checks for the presence of one particular file and restarts one particular lambda if it does not exist. this gets much easier if your process is idempotent, so you don't have to worry too much if a file is produced twice, though it's not too hard to check if the lambda a supervisor is watching is still running using the aws api. this supervisor can be started by the thing it's supervising or by the thing that starts the rest of the system, whatever is easier for your codebase. You likely don't need to explicitly start the workers, the supervisor can do that.

The important part is to add cloudwatch or whatever your favourite eventing system is (mine is riemann) so you can add alerts to know when you need to watch the watchers.

Arthur Ulfeldt
  • 90,827
  • 27
  • 201
  • 284
1

There is easy way out there in AWS is called AWS Step Functions. Step Functions provides a graphical console to arrange and visualize the components of your application as a series of steps. Define steps using the AWS Step Functions console or API, a fluent Java API, or AWS CloudFormation templates.

Step makes it simple to orchestrate AWS Lambda functions. Irrespective of language of function, it manages all the lambdas.

Step is good for following use cases

  1. Run sequence functions
  2. Run functions in parallel
  3. Select functions based on data
  4. Retry the functions
  5. try/catch/finally for functions
  6. Running the code for hours
SANN3
  • 9,459
  • 6
  • 61
  • 97