3

I am using withRemote to connect my java application to gremlin server running in AWS with dynamodb storage backend. I am getting connection timeout after few seconds (~3.3 seconds):

org.apache.tinkerpop.gremlin.process.remote.RemoteConnectionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.nio.channels.ClosedChannelException]]

I need to figure out how to reconnect which means detecting if the connection is closed. I am not sure how to detect that. I get the above exception when I use the graph traversal, is there a way to discover it before and reconnect or is there an option in configuration that allows reconnecting automatically (like create new connection before this one closes) so my application is always connected?

In case you need, this is how I am doing connection - currently connection part is singleton when the application starts:

  this.graph = EmptyGraph.instance();
  GryoMessageSerializerV1d0 gryoMessageSerializerV1d0 = new GryoMessageSerializerV1d0(
        GryoMapper.build().addRegistry(JanusGraphIoRegistry.getInstance()));
  this.cluster = Cluster.build().serializer(gryoMessageSerializerV1d0)
        .addContactPoint(configuration.getString("graphDb.host", "localhost"))
        .port(configuration.getInt("graphDb.port", 8182)).create();
  this.graphTraversalSource = this.graph.traversal().withRemote(DriverRemoteConnection.using(cluster));
ketan vijayvargiya
  • 5,409
  • 1
  • 21
  • 34
monali01
  • 160
  • 1
  • 9
  • 1
    I would first try to figure why the connection loss is happening. Also, I am removing the DynamoDB tag as the problem has nothing to do with it. – ketan vijayvargiya Sep 25 '17 at 02:10
  • 1
    I figured out why my connection loss is happening. I had my AWS load balancer TCP ideal timeout to 60 seconds and most of my gremlin calls were creating data without returning anything back from gremlin server so it timed out. I still need to figure out how to check if the connection is still active or not before making any gremlin requests and if it is not active then reconnect - anyone know how to check that? – monali01 Sep 26 '17 at 04:17
  • Hi @monali01 even im facing the same issue in my java application im doing his.graphTraversalSource = this.graph.traversal().withRemote(DriverRemoteConnection.using(cluster)); // g.V().addV()..... request this.graphTraversalSource.close() this loop. As you previously do. Can you share the solution and how did you solve the issue. Thanks in advance. – mmr25 Sep 29 '20 at 17:12

1 Answers1

2

I feel like this problem is already solved with connection.keepAlive configuration option. It defaults to 180 seconds so it's longer than your timeout of 60 seconds in your load balancer which is why it gives up.

That said, the driver should be reconnecting on its own. It's constantly trying to do that given the connectionPool.reconnectInterval but perhaps there is a condition where you're quickly exhausting all the connections to the point of getting that error....not sure. Either way, hopefully the

stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • " but perhaps there is a condition where you're quickly exhausting all the connections to the point of getting that error" - how many total connections are allowed? also, another thing i noticed if I use different connection per request, it went up to ~1869 connections and then server started throwing this exceptions: [gremlin-server-boss-1] WARN io.netty.channel.DefaultChannelPipeline - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. java.io.IOException: Too many open files – monali01 Sep 29 '17 at 17:46
  • this.graphTraversalSource = this.graph.traversal().withRemote(DriverRemoteConnection.using(cluster)); // g.V().addV()..... request this.graphTraversalSource.close() this.graphTraversalSource = this.graph.traversal().withRemote(DriverRemoteConnection.using(cluster)); // g.V().addV()..... request this.graphTraversalSource.close() and so on loop – monali01 Sep 29 '17 at 17:47
  • 1
    I don't know how many can be allowed, I just meant that whatever your client connection pool was configured for might have been exhausted. "Too many open files" is a fairly common linux error - if you search that phrase in google you'll get a bunch of solutions. as i look at your code in that comment though, i can see why the server is showing so many open connections. there is no need to recreate the `TraversalSource` over and over like that. Just do this once - `g = graph.traversal().withRemote(...)` and re-use `g`. – stephen mallette Sep 30 '17 at 10:31
  • I had it that way originally but once I resolved the connection timeout problem, I was able to go through more requests (~1000) and then nothing happens - no errors, and eventually gremlin-server crashes and I don't see any errors on the server logs or from my java app where i added "TRACE" level logs. And few times, server doesn't crash but it doesn't send requests, maybe it is not finding a connection? I do see msgs related to keeping connection live - Request sent to server to keep Connection{host=***}} alive and response - Received response from keep-alive request – monali01 Sep 30 '17 at 15:28
  • well - you definitely don't want to keep opening/closing remote TraversalSource instances like that. Open once and re-use. Can you easily recreate this problem where it only handles 1000 requests and then dies? If so, I suggest you come up with the most minimal configuration/setup you can and describe it on the gremlin-users mailing list. – stephen mallette Oct 02 '17 at 10:45
  • I can try to come up with configuration that kills the server and provide more details on mailing list. I was able to create 45k vertexes by using storage.batch-loading=true option. I still ran into server crash problem when I created them and was trying to get the vertex.id back to use it later to create edges. I am still figuring out how all the configuration values will fit for our setup. – monali01 Oct 03 '17 at 00:58
  • How do I load balance connections on AWS between multiple gremlin-servers if I open only one remote TraversalSource instance at the beginning of the application lifecycle? If needed, I can submit this as a new question. – monali01 Oct 03 '17 at 02:09
  • 1
    The `Cluster` object you hand to `withRemote()` will round-robin requests to different servers though you could also put a load-balancer in front of your Gremlin Servers (that's a fairly common pattern). – stephen mallette Oct 03 '17 at 10:17