I am currently using Janusgraph version 0.5.2. I have a graph with about 18 million vertices and 25 million edges.
I have two versions of this graph, one backed by a 3 node Cassandra cluster and another backed by 6 Cassandra nodes (both with 3x replication factor)
I am running the below query on both of them:
g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').next()
The issue is that this query takes ~130ms on the 3 node cluster whereas it takes ~400ms on the 6 node cluster.
I have benchmarked around ten queries and this is the only one where there is a significant difference in performance between the two clusters.
I have tried running .profile()
on both versions and the outputs are almost identical in terms of the steps and time taken:
g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(label_A), o... 1 1 4.582 0.39
\_condition=(~label = label_A AND some_id = 123 AND data.name = value1)
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=multiKSQ[1]@8000
\_index=someVertexByNameComposite
optimization 0.028
optimization 0.907
backend-query 1 3.012
\_query=someVertexByNameComposite:multiKSQ[1]@8000
\_limit=8000
RepeatStep([JanusGraphVertexStep(BOTH,[... 2 2 1167.493 99.45
HasStep([data.name.eq(... 803.247
JanusGraphVertexStep(BOTH,[... 12934 12934 334.095
\_condition=type[sample_edge]
\_orders=[]
\_isFitted=true
\_isOrdered=true
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c
\_multi=true
\_vertices=264
optimization 0.073
backend-query 266 5.640
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c
optimization 0.028
backend-query 12689 312.544
\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c
PathFilterStep(simple) 12441 12441 10.980
JanusGraphMultiQueryStep(RepeatEndStep) 1187 1187 11.825
RepeatEndStep 2 2 810.468
RangeGlobalStep(0,1) 1 1 0.419 0.04
PathStep([value(data.name)]) 1 1 1.474 0.13
>TOTAL - - 1173.969 -
NOTE: You may have noticed that the profile step above shows a time taken of >1000ms. I believe this is another issue that is not related to this post.
Some other points that might be helpful:
- The 3 and 6 node clusters are identical in terms of hardware
- We aren't running Janusgraph in embedded mode (where it is colocated with Cassandra), instead it is running separately on its own server nodes
- As mentioned earlier, the slowness is only observed for
path
queries. For instance, here's an example of another traversal query where we observe the same latency across the 3 and 6 node clusters:g.V().hasLabel('label_B').has('some_id', 123).has('data.name', 1234567).both('sample_edge').valueMap('data.field1', 'data.field2').next(10)
I'd really appreciate any input on figuring out why the query is 3x slower on 6 nodes.
Happy to provide more information as required!
Thank you!