0

I have created a Neptune instance in my AWS and a Load Balancer to access it from my local machine to play around. I'm basically redirecting all connections on the :80 at my LB to :8182 in my Neptune. So I can easily query it through the browser. In fact, this is the output for the /status:

// 20191211170323
// http://my-lb/status

{
  "status": "healthy",
  "startTime": "Mon Dec 09 20:06:21 UTC 2019",
  "dbEngineVersion": "1.0.2.1.R2",
  "role": "writer",
  "gremlin": {
    "version": "tinkerpop-3.4.1"
  },
  "sparql": {
    "version": "sparql-1.1"
  },
  "labMode": {
    "ObjectIndex": "disabled",
    "Streams": "disabled",
    "ReadWriteConflictDetection": "enabled"
  }
}

Problem is when I try to connect with it through Gremlin Console or Java code I'm getting the following errors:

gremlin> :remote connect tinkerpop.server conf/remote-neptune.yaml
ERROR org.apache.tinkerpop.gremlin.driver.Handler$GremlinResponseHandler  - Could not process the response
io.netty.handler.codec.http.websocketx.WebSocketHandshakeException: Invalid handshake response getStatus: 403 Forbidden
    at io.netty.handler.codec.http.websocketx.WebSocketClientHandshaker13.verify(WebSocketClientHandshaker13.java:226)
    at io.netty.handler.codec.http.websocketx.WebSocketClientHandshaker.finishHandshake(WebSocketClientHandshaker.java:276)
    at org.apache.tinkerpop.gremlin.driver.handler.WebSocketClientHandler.channelRead0(WebSocketClientHandler.java:69)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297)
    at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:617)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:534)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.lang.Thread.run(Thread.java:748)

And my remote-neptune.yaml is as simple as:

hosts: [my-lb]
port: 80
connectionPool: { enableSsl: false}
serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}

I have updated my AWS credentials although I don't think that's related since I'm accessing it through the LB.

And the weirdest part is that this same scenario was working like a week ago :/

Any ideas? Thanks!

João Menighin
  • 3,083
  • 6
  • 38
  • 80
  • Hi there. I was able, using your YAML to connect from Gremlin console to my Neptune cluster via an ALB. The only difference I can see is my ALB is listening on port 8182. Just to clarify a couple of things. Are you using an ALB or an NLB and which Gremlin Console/JAR file version do you have? – Kelvin Lawrence Dec 12 '19 at 15:37
  • Have you enabled IAM auth on the instance? Did you try earlier without IAM auth before when it was working? – Ankit Gupta Dec 12 '19 at 18:16
  • Hi @KelvinLawrence. I'm using an ALB. I'm using Gremlin Console 3.4.4 and in my app (that doesnt work as well) I'm using `gremlin-driver:3.4.4` as well. – João Menighin Dec 13 '19 at 12:01
  • @AnkitGupta do you mean in the ALB instance? I have not configured anything realted to IAM... As far as I know, it is open n_n' – João Menighin Dec 13 '19 at 12:02
  • Would it be possible to test with Gremlin Console at the 3.4.2 or 3.4.1 level? There was a change to the way the console sends text using the Gremlin Binary wire format I believe either in 3.4.4 or 3.4.3 and Neptune currently is at the TinkerPop 3.4.1 level. Have you recently upgraded your Gremlin client by any chance to 3.4.4 ? My test was using the 3.4.2 level console. – Kelvin Lawrence Dec 13 '19 at 21:19
  • Guys it's working again out of the blue -_- I'm checking with the security guys if anything changed on the network configurations in my company on the last few days... But that's it, it's simply working again... – João Menighin Dec 16 '19 at 14:37
  • @JoãoMenighin I've posted a response on how you can investigate connection failures for the future. Do take a look, and let me know if that would be an acceptable answer for you. TL;DR - classify connection issues to be an L2/L3 issue vs L7 issue first, and that should be enough to streamline ideas on how to investigate further. We see this happen a lot. – The-Big-K Dec 24 '19 at 06:08

1 Answers1

3

Looks like the problem has auto resolved, but just sharing a few things to watch out for in case this happens again in the future. If you see connection issues, your first line of operation should be to check if its a network connectivity issue. (You mentioned that you were going to check if something changed with regards to security groups, so do update if that was indeed that case). To check if it indeed is a SG issue - log into your client instance, and do a simple telnet call to the DB endpoint.

 telnet <endpoint> <port>

If it responds with "Connected", then you can be sure that your SGs are correct, and now you are dealing with an Application layer problem.

As called out in comments, some of the possible culprits could be:

  1. You previously had a setup without IAM Auth in Neptune (not on ALB) and now you enabled IAM Auth. (Emphasis - I'm referring to IAM Auth on the database, and not some other component in between).

  2. Gremlin client-server mismatches.

  3. Some explicit settings on the ALB that could hinder the requests.

And a few others. To summarize, try to classify if it is a L2/L3 issue or an L7 issue and start investigating based off that.

The-Big-K
  • 2,672
  • 16
  • 35