AWS Lambda requesting another lambda behind private API gateway - DNS resolve not working

Question

I'd like a lambda (transformer) to call another lambda (source) without going through lambda.invoke, rather just requesting it by going through the (private) api gateway one more time. The background is that this allows a more straightforward debugging, testing & development (by not relying on aws facilities within the main functions). Running transfomer locally works fine (container can request the api gateway, in that case I replace the aws_api_root_uri with http://host.docker.internal:3000, which launches another container for the source lambda). However, trying the same after deployment on AWS I end up with

HTTPSConnectionPool(host='<id>.execute-api.eu-central-1.amazonaws.com', port=443): Max retries exceeded with url: /sandbox/sources/test (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f73ebda3790>: Failed to establish a new connection: [Errno -2] Name or service not known'))

This seems like a DNS problem accessing the private resource. Using the public internet on the same lambda works fine, i.e. google.com can be requested fine. On the other hand, accessing the same URI that produces an error in the lambda, but on an EC2 instance in the same VPC, works fine as well.

This rules out a couple of things

cannot be a general DNS protocol access issue
the source lambda is available and can be requested on the network as expected

Do I need to add a specific policy that a VPC EC2 instance implicitely has, but a lambda on that VPC does not? If so, which policy would that be? Or can you think of another problem that is causing this issue?

expected response when calling GET on transfomer URI

{"message": "hello world", "location": "<ip>", "added": "value"}

actual response

{"message": "Internal server error"}

template.yaml

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  SAM template

Globals:
  Function:
    Timeout: 10  # need a relatively long timeout for the local deployment - two container spin-ups in series needed

Parameters: 
  MyEndpointId:
    Type: String
    Description: The AWS endpoint ID for the API Gateway endpoint

Resources:
  MyTestSourceFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: python3.8
      CodeUri: data_sources/test_source
      Handler: service.lambda_handler.handle_lambda
      Events:
        MyTestSourceApi:
          Type: Api
          Properties:
            RestApiId: !Ref MyApi
            Path: /sources/test
            Method: get

  MyTestTransformerFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: python3.8
      CodeUri: data_transformers/test_transformer
      Handler: service.lambda_handler.handle_lambda
      Events:
        MyTestTransformerApi:
          Type: Api
          Properties:
            RestApiId: !Ref MyApi
            Path: /transformers/test
            Method: get
  
  MyApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: sandbox
      MethodSettings:
        - HttpMethod: '*'
          ResourcePath: /*/*/*
          LoggingLevel: ERROR
          ThrottlingBurstLimit: 5000
          ThrottlingRateLimit: 10000
      EndpointConfiguration: 
        Type: PRIVATE
        VPCEndpointIds:
          - !Ref MyEndpointId
      Auth:
        ResourcePolicy:
          CustomStatements: [{
            "Effect": "Allow",
            "Principal": "*",
            "Action": "execute-api:Invoke",
            "Resource": "execute-api:/*/*/*/*"
          }]

data_sources/test_source/service/lambda_handler.py

import json
from .main import generate_test_data

def handle_lambda(event, context):
    return {
        "statusCode": 200,
        "body": json.dumps(generate_test_data()),
    }

data_sources/test_source/service/main.py

import requests

def generate_test_data():
    try:
        ip_address = requests.get("http://checkip.amazonaws.com/").text.replace("\n", "")
    except requests.RequestException as exc:
        print(exc)
        raise exc

    return {
        "message": "hello world",
        "location": ip_address
    }

data_transformers/test_transformer/service/lambda_handler.py

import os
import json
from .main import load_and_transform

def handle_lambda(event, context):
    aws_api_root_uri = "https://{}.execute-api.{}.amazonaws.com/{}/".format(
        event['requestContext']['apiId'],
        os.environ['AWS_REGION'],
        event['requestContext']['stage']
    )
    return {
        "statusCode": 200,
        "body": json.dumps(load_and_transform(aws_api_root_uri)),
    }

data_transformers/test_transformer/service/main.py

import urllib.parse
import requests

def load_and_transform(api_root_path):
    test_data = {}
    try:
        # do we have DNS working?
        print(requests.get("https://www.google.com/").text)

        # can we receive data from another lambda?
        uri = urllib.parse.urljoin(api_root_path, "sources/test")
        print(f"test transfomer is sending query to {uri}")
        test_data = requests.get(uri).json()
        test_data["added"] = "value"
    except Exception as exc:
        print(exc)
        raise exc

    return test_data

I'm pretty sure you'll need a resource policy on the API to allow the lambdas to connect. You'll probably also need to add those lambdas to the same VPC. But that's just based on a quick skim of the private API docs. I have not done this and do not know enough to provide a proper answer. — Corin, Sep 14 '20 at 13:51
they're already on the same VPC. Thank you, I'll look into resource policies. — Michel Müller, Sep 14 '20 at 14:01
Must be my lack of familiarity with SAM then. I do not see where in the template file the VPC for the lambdas are specified. But if that's already there, then the resource policies are what you're looking for. — Corin, Sep 14 '20 at 14:04
I mean maybe it's also my own unfamiliarity, but I'm passing in the endpoint through `MyEndpointId` to the api gateway. That way it gets deployed onto that endpoing which resides in the respective VPC. Both lambdas are using `MyApi` so they both get attached there. — Michel Müller, Sep 14 '20 at 14:07
regarding resource policies: the API gateway already has one set, see the `ResourcePolicy` statement. That IMO should allow access to all consumers of those APIs on the same VPC (together with a manually set security group on the endpoint that allows the VPC's network range). — Michel Müller, Sep 14 '20 at 14:11
Right, so the API endpoint is in the VPC. That is clear. However, for your lambdas to be in that VPC, they need a VpcConfig (see https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-lambda-function.html#cfn-lambda-function-vpcconfig). — Corin, Sep 14 '20 at 14:20
This hit the spot! was able to figure it out and I'm about the write an answer with the additional config needed. — Michel Müller, Sep 14 '20 at 15:29

score 0 · Answer 1 · answered Sep 14 '20 at 15:38

Thank you, @Corin, for providing the correct hint! I was missing the VPC config on the lambda functions as well as a policy to make it work:

additional parameters

Parameters:
  MySubnet1:
    Type: String
    Description: The subnet to use for lambda functions in the first availability zone
  MySG:
    Type: String
    Description: The security group to be used for the lambda functions

additional globals

this way the VPC config can be shared among all the lambdas

Globals:
  Function:
    VpcConfig:
      SubnetIds:
        - !Ref MySubnet1
      SecurityGroupIds:
        - !Ref MySG

additional policies per lambda

without this, the VPC config will fail as the lambdas don't get permission to do anything with networks

Resources:
  MyLambdaFunction:
    Properties:
      Policies:
        - VPCAccessPolicy: {}