27

There are a few pieces of my app that cannot afford the additional 1-2 second delay caused by the "freeze-thaw" cycle that Lambda functions go through when they're new or unused for some period of time.

How can I keep these Lambda functions warm so AWS doesn't have to re-provision them all the time? This goes for both 1) infrequently-used functions and 2) recently-deployed functions.

Ideally, there is a setting that I missed called "keep warm" that increases the cost of the Lambda functions, but always keeps a function warm and ready to respond, but I'm pretty sure this does not exist.

I suppose an option is to use a CloudWatch timer to ping the functions every so often... but this feels wrong to me. Also, I don't know the interval that AWS uses to put Lambda functions on ice.

samcorcos
  • 2,330
  • 4
  • 27
  • 40
  • 5
    Using something like a CloudWatch timer to ping the function every so often is the only way to accomplish what you want at this time. – Mark B Mar 18 '17 at 18:32
  • Thanks @MarkB Do you have a sense of the ideal interval? I can't seem to find any concrete numbers in the Lambda documentation. Are we talking 1 minute pings? 1 hour pings? – samcorcos Mar 18 '17 at 18:43
  • 4
    Anecdotally, I think you want something in the 5 to 15 minute range. Declare a global variable outside the handler, then inside the handler, set that variable to the value from [`context.awsRequestId`](http://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-context.html) *only if it's not already set*... then log the value of the variable. This gives you essentially a unique container ID that you can use to track container reuse and determine the effectiveness of your strategy. Store the time in a similar variable, and an incrementing counter, and you get a good perspective. – Michael - sqlbot Mar 18 '17 at 18:55
  • @Michael-sqlbot I ran some tests (see link) and it looks like the interval doesn't matter all that much... Thoughts? https://medium.com/@SamCorcos/how-to-keep-your-lambda-functions-warm-9d7e1aa6e2f0#.pn9jpgndt – samcorcos Mar 21 '17 at 00:23
  • 2
    Nice post! Note, though, that I believe you're misapplying the "freeze/thaw" concept. The freeze/thaw cycle is what's *preserving* the variables. It's when you're using a newly launched container -- not a thawed one that was previously frozen -- that you see the delay: ['Lambda will actually “freeze” the process and thaw it out the next time you call the function (but *only* if the container is reused, which isn’t a guarantee).'](https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/) The long spin-up times are when there's not a frozen conainer available to be thawed and reused. – Michael - sqlbot Mar 21 '17 at 01:37
  • The other thing worth noting, is that what you're doing here is keeping one container ready to go... but one container only handles one concurrent request at a time. (The runtime+memory-based billing model wouldn't make sense, otherwise, not to mention the complexity increase). If a second request comes in while the first container is already processing the first one, that second request will still see the spin-up delay... unless, of course, you managed to keep two (or however many) containers ready. But, with an increase in traffic, that does tend to solve itself over time.. – Michael - sqlbot Mar 21 '17 at 01:42
  • @Michael-sqlbot makes sense, thank you for the clarification. I went ahead and updated the article accordingly. – samcorcos Mar 22 '17 at 18:54
  • @Michael-sqlbot the iPlayer engineering side I added as an answer below suggest otherwise. They pre-warm one container to later run 100s of request in parallel. If your assumption was right, this wouldn't work. – Jonathan Oct 12 '17 at 09:13
  • @Jonathan, my assumption isn't an assumption. You can easily observe this by creating a global variable, populating it with a random value if undefined, and then exposing its value in the Lambda response or the logs. Add to that, a counter incremented with each invocation. These serve as a container identifier and a request counter. My suspicion is that the authors of that post have not fully confirmed their assumptions and jumped to a conclusion that they are accomplishing something that they aren't. Test and see what you find. – Michael - sqlbot Oct 12 '17 at 10:53
  • Additionally, if you think about it, 200 simultaneous invocations in one container would perform terribly, because the 200 tasks would be competing for CPU and memory. If not, then how did the system know to run these 200 invocations in a container on a machine with 200 cores and 300 GiB of RAM? To ask the question is to answer it... it can't possibly work that way. – Michael - sqlbot Oct 12 '17 at 10:57

2 Answers2

11

UPDATE DEC 2019

AWS now also offers 'Provisioned Concurrency'. https://aws.amazon.com/blogs/aws/new-provisioned-concurrency-for-lambda-functions/

Basically you pay around 10$/month (for a 1GB Lambda) per instance that you want to keep 'warm'.

GeertPt
  • 16,398
  • 2
  • 37
  • 61
7

BBC has published a nice article on iPlayer engineering where they describe a similar issue.

They have chosen to call the function every few minutes using CloudWatch Scheduled Events.

So in theory, it should just stay there, except it might not. So we have set up a scheduled event to keep the container ‘warm’. We invoke the function every couple of minutes not to do any processing, but to make sure we’ve got the model ready. This accounts for a very small percentage of invocations but helps us mitigate race conditions in the model download.(We also limit artificially how many lambdas we invoke in parallel as an additional measure).

Jonathan
  • 752
  • 8
  • 18