Mysterious timeout when connecting to neptune db

Question

I'm getting this error message when trying to connect to a aws neptune db from a lambda:

2022-05-05T18:36:04.114Z    e0c9ee4c-0e1d-49c7-ad05-d8bab79d3ea6    WARN    Determining whether retriable error: Server error: {
    "requestId": "some value",
    "code": "TimeLimitExceededException",
    "detailedMessage": "A timeout occurred within the script or was otherwise cancelled directly during evaluation of [some value]"
} (598)

The timeout happens consistently after 20s.

It's not clear what's causing this. Things I've tried:

increasing the lambda memory in case it's just a hardware problem, but no luck
increasing neptune query timeout from 20s to 60s, but the request still times out at 20s.

This is the code of the lambda that tries to initialize the connection:

import { driver, structure } from 'gremlin';
import { getUrlAndHeaders } from 'gremlin-aws-sigv4/lib/utils';

const getConnectionDetails = () => {
    if (process.env['USE_IAM'] == 'true') {
        return getUrlAndHeaders(
            process.env['CLUSTER_ENDPOINT'],
            process.env['CLUSTER_PORT'],
            {},
            '/gremlin',
            'wss'
        );
    } else {
        const database_url =
            'wss://' +
            process.env['CLUSTER_ENDPOINT'] +
            ':' +
            process.env['CLUSTER_PORT'] +
            '/gremlin';
        return { url: database_url, headers: {} };
    }
};
const getConnection = () => {
    const { url, headers } = getConnectionDetails();

    const c = new driver.DriverRemoteConnection(url, {
        mimeType: 'application/vnd.gremlin-v2.0+json',
        headers: headers,
    });

    c._client._connection.on('close', (code, message) => {
        console.info(`close - ${code} ${message}`);
        if (code == 1006) {
            console.error('Connection closed prematurely');
            throw new Error('Connection closed prematurely');
        }
    });

    return c;
};

This was working previously using more powerful hardware (r4.2xlarge) for the neptune db, but I changed that t3.medium to minimize cost and it seems that's when the problem started. But I find it hard to believe that this hardware change alone would cause the connection to timeout, and it's odd that it continues to timeout at exactly 20s. Any ideas?

When you say you increased the Neptune timeout from 20s to 60s, did you make this change to both the parameter group assigned to the instance, and the cluster parameter group? If either is still 20s, that will take precedence. — Brian O'Keefe, May 05 '22 at 20:53
Adding to Brian's comment - did you restart the instance after making the change to the Parameter Group. — Kelvin Lawrence, May 05 '22 at 21:27
@BrianO'Keefe I made the change to the value of the parameter when re-deploying the cloudformation template I used. How can I confirm the difference you're asking about? [This is the linked cloudformation template](https://s3.amazonaws.com/aws-neptune-customer-samples/v2/cloudformation-templates/neptune-base-stack.json), it's the `NeptuneQueryTimeout` parameter and it's applied to the `DBParameterGroup`. — Uche Ozoemena, May 06 '22 at 09:17
@KelvinLawrence I redeployed by updating the parameter value via the cloudformation dashboard. That should've restarted the instance, right? — Uche Ozoemena, May 06 '22 at 09:18
Can you please issue a call to `/status` or if using a Jupyter NB use `%status` and see what it shows the timeout as. The default cluster timeout is 2 minutes so I assume at some point you lowered it to 20s ? The status API will show the timeout in place for the instance you are connecting to. — Kelvin Lawrence, May 06 '22 at 15:07
Yeah that's exactly right it's still 20s in the output from `/status`. What is the correct way to change it? [This is the template](https://s3.amazonaws.com/aws-neptune-customer-samples/v2/cloudformation-templates/neptune-base-stack.json) I'm using. — Uche Ozoemena, May 07 '22 at 14:35
Just to add, today I've added the same query timeout to the cluster parameter group and redeployed but it didn't change the timeout reported by `/status`. What's the correct way to restart the instance as part of a redeploy? I've tried changing the supplied parameter values as well as the defaults specified in the template, but with no luck. — Uche Ozoemena, May 09 '22 at 10:21
Can you please clarify what you mean by redeploy? The parameter group can be changed in place for a running cluster. Are you forced to build a new cluster each time for some reason? Normally you would just edit that in place. — Kelvin Lawrence, May 09 '22 at 14:40
By redeploy I mean updating the stack so a new template and/or set of parameters is applied. — Uche Ozoemena, May 09 '22 at 14:41
OK thanks for clarifying. You still need to restart the instance you are connection to for those changes to take effect. If you were to look with the CLI I suspect you will see those changes are "pending" — Kelvin Lawrence, May 09 '22 at 14:49
As Kelvin mentions, a reboot needs to occur for each instance in the cluster after making changes to the cluster or instance timeout values. Changing the parameter via console, CLI, or CloudFormation does not automatically kick off the reboot. You can see the latest reboots in the Neptune console, under each instance, under Logs & Events -> Recent Events. — Taylor Riggan, May 09 '22 at 14:51
Okay thanks for the info. Please bear with me if it's obvious to you but not to me, how do I restart the instance? I had a cloudformation stack that was put in an inconsistent state when I made changes to it outside the template (via the dashboard), and resolving that problem has been a PITA. So I'm trying to be as careful as possible with this one. — Uche Ozoemena, May 11 '22 at 08:13
You can restart an instance from the AWS Console (web page) or using the CLI `aws neptune reboot-db-instance` command — Kelvin Lawrence, May 17 '22 at 13:35

score 1 · Accepted Answer · answered May 17 '22 at 13:36

1

Once parameter group values are changed, the instance you are connecting to still needs to be restarted for them to take effect. You can do this:

From the AWS Console (web page) for Neptune
From the CLI using aws neptune reboot-db-instance

answered May 17 '22 at 13:36

Kelvin Lawrence

14,674
2
16
38

Mysterious timeout when connecting to neptune db

1 Answers1