Are there conditions under which variables in an AWS Node Lambda persist between invocations?

Question

I wouldn't think so, but I don't have another good explanation for what I observed. Here's a rough version of the relevant code, which is inside the handler function (i.e., it would not be expected to persist between invocations):

const res = await graphqlClient.query({my query})
const items = res.data.items
console.log(items) // <- this is the line that logs the output below

items.push({id: 'some-id'})
const itemResults = await Promise.all(items.map((item) => etc etc)

Over successive invocations from my client, spaced less than ten seconds apart, some-id was repeatedly added to items. On the first invocation, this is what was logged in CloudWatch after const items = res.data.items:

[
  {
    anotherId: 'foo',
    id: 'bar',
  }
]

The 2nd time it was invoked, after a few seconds, written to the logs before the call to items.push():

[
  {
    anotherId: 'foo',
    id: 'bar',
  },
  { id: 'some-id' }
]

The 3rd time, again written to the logs before the call to items.push():

[
  {
    anotherId: 'foo',
    id: 'bar',
  },
  { id: 'some-id' },
  { id: 'some-id' }
]

items is never written to persistent storage. some-id is only modified twice: when it's set to equal the value returned by the graphql query, and when I manually push another value onto the stack. I can prevent this bug by checking to see if some-id is already on the stack, so I'm unblocked for now, but how could it persist over successive runs? I never would've expected a Lambda to behave that way! I thought each invocation was stateless.

score 3 · Answer 1 · answered Feb 11 '21 at 08:31

AWS Lambda is kind of stateless but not fully. You have to take care yourself that this is really true. Since your example code above is missing a handler function, I assume you didn't provide the full code and that you have defined const items outside of your handler function. A rough explanation based on my assumption:

Everything outside of your handler function is initialized once when starting your Lambda function for the first time (i.e. 'cold start'). You can initalize variables, database connections, etc. and reuse them in every invocation as long as the instance of your Lambda function stays alive.
Then, your handler function is invoked after the initialization steps and also for each future invocation. If you change values/objects outside of your handler function, then they'll survive the current invocation and you can use them in your next invocation. This way, you can cache some expensive data or do some other optimizations. For example:

const items = []

exports.handler = function(...) {
  // ...
  items.push(...)
  // ...
}

This is also true for Java and Python Lambda functions and I believe for most other runtimes as well. Now, this is probably the explanation to what you observe: in one invocation you are pushing something to items and in the next one invocation, the previous data has survived because it was stored outside of the handler function.

Suggestion in your case: if you want full stateless functions, don't modify data outside of your handler function and instead, only store values inside. But take care that this can slow down your Lambda functions if you need to initialize data in each invocation.

Since this behavior of AWS Lambda is often used for caching data, there are a few blog posts covering this topic as well and how the code is handling it. They usually provide more visual explanations and example code:

Caching in AWS Lambda (note: my own blog post)
Leveraging Lambda Cache for Serverless Cost-Efficiency
All you need to know about caching for serverless application (This is covering much more about caching but one part of it is also considering caching inside a Lambda function)

There's much more happening behind the scenes of course. If you are interested in how this whole process works, I can recommend you taking a look into the Execution Environment Details. The article is more focused on giving background to building extensions and how the process outside of the code is working but it might help you understand what's happening behind the scenes.

`items` (& other code above) is defined in the handler function. There's a few vars outside – some GQL, imports, and a method to which the content of `items` are passed (via `await Promise.all()`). It's set to equal the return value of a GQL query, and a single value is pushed to it. How the previous pushes persist, after the value is set to the query results, I don't know. That's what's so puzzling, everything I've read suggests this *shouldn't* happen. I haven't put together a minimal version of the code which would make this reproducible, since I have a workaround, but I'm really curious. — Jim J, Feb 11 '21 at 15:04
And what kind of GraphQL client are you using? Is that maybe caching the response inside the code? (Because you define items as `res.data.items` - maybe the response is somehow cached) — s.hesse, Feb 11 '21 at 15:06
Seems like you're not alone there.. https://stackoverflow.com/questions/51306956/aws-appsync-query-returns-cached-response-even-when-offline-is-disabled Does this help you? — s.hesse, Feb 11 '21 at 15:08
Thanks for trying to run this down, but that doesn't help either. I'm already using the correct fetchPolicy setting in the answer to that one, and the duplicated value isn't returned by the query. The duplicated value is added afterward, to the query results, and then not saved anywhere, which makes it even *more* odd that it persists. — Jim J, Feb 11 '21 at 15:42

Are there conditions under which variables in an AWS Node Lambda persist between invocations?

1 Answers1