13

I am experiencing an issue where Lambda functions occasionally time out without any error message other than a notification that the function timed out.

In order to find the root of the issue, I added logging at various points throughout my function and determined that everything functions properly until the first getItem() request to read data from DynamoDB. The read seems to be taking more than the 3.00 second timeout.

Naturally, I checked my DynamoDB table to see if there were any throttled reads or errors. DynamoDB's metrics show no throttles or errors, and read times remain in the double-digit milliseconds at most.

Clearly something is going wrong or getting dropped along the way. How can I fix this issue or at least catch it and retry the read?

This is a read-oriented function for a web API, so response times are critical. Hence, an increased timeout will not solve the issue.

dynamodb.getItem({
  "TableName": "tablename",
  "Key": { "keyname": { "S": "keyvalue" } },
  "AttributesToGet": [ "attributeA", "attributeB" ]
}, function(err, data) {
  if(err){
    context.done(err);
  } else {
    if("Item" in data){
      nextFunction(event, context);
    } else {
      context.done("Invalid key");
    }
  }
});

No throttled reads

Read latency appears to be minimal

Tyler
  • 3,616
  • 3
  • 22
  • 34

5 Answers5

6

After significantly increasing the timeout, I found that a network error is eventually thrown:

{
    "errorMessage": "write EPROTO",
    "errorType": "NetworkingError",
    "stackTrace": [
        "Object.exports._errnoException (util.js:870:11)",
        "exports._exceptionWithHostPort (util.js:893:20)",
        "WriteWrap.afterWrite (net.js:763:14)"
    ]
}

This issue appears to be caused by an issue between Node.js and OpenSSL according to this thread. It sounds like the issue affects Node.js 4.x and up but not 0.10. This means you can either resolve the issue by downgrading the Lambda runtime to Node.js 0.10 or adding the following code when using aws-sdk:

new AWS.DynamoDB({
  httpOptions: {
    agent: new https.Agent({
      rejectUnauthorized: true,
      secureProtocol: "TLSv1_method",
      ciphers: "ALL"
    })
  }
});
Tyler
  • 3,616
  • 3
  • 22
  • 34
  • Did you try using TLS v1? – Tyler Nov 05 '18 at 18:39
  • 2
    The tricky thing here is that to be able to debug these issues you need to let the DynamoDB call fail. If used within a Lambda, make sure the call fails before the Lambda (or API Gateway) times out. We wrote about it in detail here - https://seed.run/blog/how-to-fix-dynamodb-timeouts-in-serverless-application – jayair Aug 06 '19 at 23:26
  • Thanks, increasing the timeout was a the crucial remark for me. – pauldendulk Aug 23 '21 at 07:36
  • What is "significantly increasing the timeout?" A couple of minutes, hours, more? I'm experiencing a similar issue where the Lambda times out without giving an exception. I'm wondering what a reasonable lambda timeout value is to catch the error. – h0r53 Nov 07 '22 at 20:12
  • 1
    @h0r53 a minute or a couple of minutes was plenty in my case – Tyler Nov 08 '22 at 02:09
5

Ran into a random lambda timeout issues while "put"ting data from lambda to DynamoDB. Lambda resides in a VPC (per organization policy).

Issue: Some (random) lambda containers would consistently fail while putting data and times out (set to 30 sec), while other containers got done putting data in a few milliseconds.

Root cause: There were two subnets (as suggested by AWS) configured. One was a private subnet and other was a public subnet. When a new lambda container is spun-off, it would randomly select one of the subnets. If it choose public subnet, it would consistently fail. If it choose private subnet, it would be done in a few milliseconds.

Solution: Remove public subnet and, rather, have two private subnets configured.

SystemDLL
  • 71
  • 1
  • 6
  • What causes it to fail in the public subnet and succeed in the private subnet? What is the reason that the lambda function has to be in the private subnet? – xtra May 28 '19 at 08:42
  • 2
    Lambda function is in a VPC, as it accesses an Aurora RDS. For more information on how to access internet (DynamoDB endpoint in this case) from a lambda in VPC, please refer to https://aws.amazon.com/premiumsupport/knowledge-center/internet-access-lambda-function/ – SystemDLL Jun 03 '19 at 19:22
4

If your are launching your Lambda in VPC, try to launched in a Private Subnet instead of Public Subnet. I had the same problem and launching Lambda in a Private Subnet worked for me.

user1297406
  • 1,241
  • 1
  • 18
  • 36
  • I had the same issue, because I didn't have a NAT gateway setup on the private subnets, the Lambda was not able to call the dynamo API. – gmanolache Nov 15 '20 at 14:15
3

Don't forget to add this to SG of your Lambda running in private Subnet: Outbounds HTTPS connection to Prefix list id of your DynamoDB VPC Endpoint

This took me couple of hours to realize. Lambda uses https to contact DynamoDB's gateway in your VPC.

  • If you have a new question, please ask it by clicking the [Ask Question](https://stackoverflow.com/questions/ask) button. Include a link to this question if it helps provide context. - [From Review](/review/late-answers/30355329) – Bracken Nov 16 '21 at 16:34
  • This is a great answer and was the solution to my issue. In my case, all the VPC stuff was configured correctly but I still could not connect to DynamoDB from my Lambda. Adding an Outbound security group rule that allowed for HTTPS connections fixed the issue. – h0r53 Nov 08 '22 at 19:58
  • Thank you. This deserves more attention. I deployed a dynamodb VPC gateway endpoint and had no idea, why connections timed out. – florian norbert bepunkt Aug 25 '23 at 19:01
1

In our case the lambda was residing in vpc, internal-public subnet and we had to add Gateway VPC Endpoint for DynamoDB. Gateway Endpoint has IP prefix lists, which had to be added to Subnet's ACL inbound/outbound. Then it started working.

A K
  • 31
  • 5