0

Few days ago, Google publish this article: https://cloud.google.com/blog/big-data/2018/07/developing-a-janusgraph-backed-service-on-google-cloud-platform

We can read from there, that it is common to deploy janus graph as a separate instance behind the internal load balancer.

So, in my project we have pretty much the same architecture: bigtable, gke with janus and some app which calls janus through load balancer. The only difference ( dunno if that's important or no, we don't have internal load balancer, we have the "external(?)" one )

So. The question is: what is the state of load balancing when using gremlin driver in java application. Our research shows that it does not work. Since connections are stateful the throughput is not forwarded randomly to janus replicas. When it sticked to one - it stays with that particular replica till the end. However, when the replica is killed, the connection somehow hangs, without any exception, warning, log, anything. It's like not information about the state of the connection at all. It is bad cause if we assume that one have automatic load balancer which spins out additional replicas when needed, it will simply does not work.

We are using janus graph 0.21 with corresponding tinkerpop driver 3.2.9 ( however we've tried many different combinations ) and still the schema stays the same. Load balancing does not work for us, as well as failover when some pod gets killed. - to make this even worse it is no really deterministic - we had some tests where it worked, but when we return to that test after a while, it doesn't.

Do you, stackoverflowers have any idea what is the state of this problem?

Michał
  • 616
  • 1
  • 7
  • 22
  • When you say "the connection somehow hangs" are you referring to the client side (i.e. driver) or your external load balancer? – stephen mallette Jul 25 '18 at 10:37
  • Well, client side. Cause when I run a completely new instance of a client, it works perfectly well. – Michał Jul 25 '18 at 10:47
  • There was a bug in 3.2.9 and earlier which was recently fixed for 3.2.10 (not released yet as of this comment) which left the driver in a hanging state if there was a disorderly shutdown of the server. Perhaps you can try the 3.2.10-SNAPSHOT to see if that resolve the hanging issue – stephen mallette Jul 25 '18 at 11:24
  • Sure, I am checking this now. Do you have a comment regarding loadbalancing? I mean, is our thinking ok when comes to balance throughput to replicas and that it doesn't really work that way? – Michał Jul 25 '18 at 12:10
  • yes - though i dont think i understand the need for sticky sessions. – stephen mallette Jul 25 '18 at 12:18
  • I don't think I understand. You don't understand why sticky sessions are introduced in gremlin-driver? – Michał Jul 25 '18 at 12:21
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/176716/discussion-between-michal-and-stephen-mallette). – Michał Jul 25 '18 at 12:24

0 Answers0