5

I have created a simple AWS Neptune cluster, with a writer and no read replicas. I used the option to create a new VPC for it, and two security groups were automatically created for it, too.

I also have a Lambda that calls that Nepture cluster's endpoint. I have configured the Lambda with the Neptune cluster's VPC, specifying all of its subnets and the two security groups mentioned above. I didn't manually modified the inbound and outbound rules once they have been automatically assigned upon me performing the VPC configuration from the AWS Console (just going through the steps).

The Lambda is written in Python and uses the requests library to make HTTPS calls, with AWS Singature V4. The execution role for the Lambda has NeptuneFullAccess and an inline policy to allow configuring a VPC for the Lambda (which has been done, so that policy works).

The Lambda calls the Neptune cluster's endpoint, with the cluster's name and ID redacted, on port 8182:

https://NAME.cluster-ID.us-east-1.neptune.amazonaws.com:8182

I get the following error:

{
  "errorMessage": "2020-05-20T21:26:35.066Z c8ee70ac-6390-48fd-a32e-36f80d58a24e Task timed out after 3.00 seconds"
}

What am I doing wrong?

UPDATE: So, it looks like the second security group for the Neptune cluster was created by me selecting an option when creating the cluster. So, I tried again with Choose existing option for the security group, instead of Create new. (I guess I was confused before, because I was creating a whole new VPC, so how could a security group already exist? But the wizard just assumes the default security group that would be created by then.)

Now, I no longer get the same error. However, what I see is this:

{
  "errorType": "Runtime.ExitError",
  "errorMessage": "RequestId: 48e3b4fb-1b88-48d3-8834-247dbb1a4f3f Error: Runtime exited without providing a reason"
}

The log shows this:

{
  "requestId": "b8b91c18-34cd-c5f6-9103-ed3357b9241e",
  "code": "BadRequestException",
  "detailedMessage": "Bad request."
}

The query was (given the Lambda code described in https://docs.amazonaws.cn/en_us/neptune/latest/userguide/iam-auth-connecting-python.html):

{
  "host": "NAME.cluster-ID.us-east-1.neptune.amazonaws.com:8182",
  "method": "GET",
  "query_type": "status",
  "query": ""
}

Any suggestions?

UPDATE: Trying against another Neptune cluster, the [Errno 111] Connection refused' error comes back. I have noticed an odd thing, however: I have some orphaned network interfaces, from when the Lambda was associated with the VPCs of now-deleted Neptune clusters. The network interfaces are marked in use, however, and I cannot detach and delete them, not even with the Force detachment option. Getting the You are not allowed to manage 'ela-attach' attachments error.

UPDATE: Starting with a fresh Lambda (no redoing its VPC configuration, and so no orphaned network interfaces anymore) and a fresh Neptune cluster with IAM Auth enabled and configured (and even with the Lambda's execution role given full admin access for the purposes of debugging, to eliminate any missing permissions), still getting this error:

{
  "errorMessage": "HTTPSConnectionPool(host='NAME.cluster-ID.us-east-1.neptune.amazonaws.com', port=8182): Max retries exceeded with url: /status/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f1f9f98c310>: Failed to establish a new connection: [Errno 111] Connection refused'))",
  "errorType": "ConnectionError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 71, in lambda_handler\n    return make_signed_request(host, method, query_type, query)\n",
    "  File \"/var/task/lambda_function.py\", line 264, in make_signed_request\n    r = requests.get(request_url, headers=headers, verify=False, params=request_parameters)\n",
    "  File \"/var/task/requests/api.py\", line 76, in get\n    return request('get', url, params=params, **kwargs)\n",
    "  File \"/var/task/requests/api.py\", line 61, in request\n    return session.request(method=method, url=url, **kwargs)\n",
    "  File \"/var/task/requests/sessions.py\", line 530, in request\n    resp = self.send(prep, **send_kwargs)\n",
    "  File \"/var/task/requests/sessions.py\", line 643, in send\n    r = adapter.send(request, **kwargs)\n",
    "  File \"/var/task/requests/adapters.py\", line 516, in send\n    raise ConnectionError(e, request=request)\n"
  ]
}
silverberry
  • 786
  • 5
  • 20

3 Answers3

3

A few things to check:

  • Is the security group attached to the Neptune instance allowing traffic from the subnets that are configured for the Lambda function? The default inbound rule for the security group attached to Neptune is to only allow traffic from the IP address from which it was provisioned.

  • The NeptuneFullAccess built-in IAM policy is for control plane actions, not for data plane operations. You'll need to create an IAM policy using the policy document defined here [1] and attach that policy to which ever Lambda execution role you are using. Then, you need to use that role to sign the request being made to Neptune. The Python request library does not do SigV4 signing, so you'll need to follow a procedure similar to what is laid out here [2].

  • If you really want to simplify all of this, we've published a Python library that helps with managing connections, IAM auth, and sending queries to Neptune. You can find it here [3].

[1] https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth.html

[2] https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth-connecting-python.html

[3] https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-python-utils

Taylor Riggan
  • 1,963
  • 6
  • 12
  • Thanks! I did enable IAM auth and add the [1] policy to the Lambda's role. But I thought it to be unnecessary due to NeptuneFullAccess, so I didn't mention it. I do, in fact, use [2]. But the provisioning IP as the default is surpassing strange. I have added the other security group there as the source and now get a different error: – silverberry May 21 '20 at 00:46
  • "HTTPSConnectionPool(host='NAME.cluster-ID.us-east-1.neptune.amazonaws.com', port=8182): Max retries exceeded with url: /status/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))" – silverberry May 21 '20 at 00:46
  • I was able to recreate the error. When you created your Neptune cluster, did you happen to change it's port to something other than 8182? When I changed my cluster's port to something other than 8182 but still attempted to connect on port 8182, I got the same error message that you received above (Connection refused). – Taylor Riggan May 21 '20 at 14:43
  • Many thanks, Taylor. But no, I didn't change the port. I did create it with a read replica, though, which I deleted afterwards. The only other changes from the default are the IAM auth enabled and the one to do with the security groups above. – silverberry May 21 '20 at 15:28
  • Oh, wait. I think the second security group was created because of an option I selected during the cluster's creation. Now trying without that... – silverberry May 21 '20 at 16:15
2

Thanks to the help of the Neptune team (an amazing response! they called me to discuss this), I was able to figure this out.

First, the Connection refused error disappeared once I redid the setup with a fresh Neptune cluster and the Use existing option for the security group, as well as a brand new Lambda added to the Neptune cluster's VPC. Apparently, redoing VPC configuration on a Lambda sometimes leaves orphaned network interfaces that are hard to delete. So, do the VPC config on a Lambda only once!

Second, the runtime error that started showing up after that is due to a bug in the Python code provided by AWS here: https://docs.aws.amazon.com/neptune/latest/userguide/iam-auth-connecting-python.html

Namely, the make_signed_request function in that script doesn't return a value. It should return r.text or, better yet, json.loads(r.text). Then, everything works just fine.

silverberry
  • 786
  • 5
  • 20
0

From your error message:

Task timed out after 3.00 seconds

You have to increase your lambda execution timeout, as your current setup of 3 seconds is not enough for it successful competition:

The amount of time that Lambda allows a function to run before stopping it. The default is 3 seconds. The maximum allowed value is 900 seconds.

If your function runs more than the set timeout, lambda service is going to terminate it due to running more than the given timeout threshold.

As a side note:

Since you use lambda in a vpc, you have to remember that lambda functions do not have public IPs nor internet access. You may not be able to connect to your db even if you increase the function timeout. This can be overcome if you run your lambda function in private subnet and have NAT gateway or NAT instance correctly setup.

Marcin
  • 215,873
  • 14
  • 235
  • 294
  • Thanks, but it doesn't seem to be it. After some changes, I get a different error, and it takes less than 100ms now. I doubt a simple status request should even take this long. – silverberry May 21 '20 at 00:44
  • @silverberry That's what I indicated in my answer as well. Is your VPC and lambda function properly setup to connect to the db? – Marcin May 21 '20 at 01:39
  • Yeah, trying to figure out how to properly set it up. A public IP should not be necessary, since VPC configuration for Lambda was introduced specifically to address this. I'll make it public only as the last resort. – silverberry May 21 '20 at 15:39