I am trying to solve a performance issue with a traversal and have tracked it down to the order().by()
step. It seems that order().by()
greatly increases the number of statement index ops required (per the profiler) and dramatically slows down execution.
A non-ordered traversal is very fast:
g.V().hasLabel("post").limit(40)
execution time: 2 ms
index ops: 1
Adding a single ordering step adds thousands of index ops and runs much slower.
g.V().hasLabel("post").order().by("createdDate", desc).limit(40)
execution time: 62 ms
index ops: 3909
Adding a single filtering step adds thousands more index ops and runs even slower:
g.V().hasLabel("post").has("isActive", true).order().by("createdDate", desc).limit(40)
execution time: 113 ms
index ops: 7575
However the same filtered traversal without ordering runs just as fast as the original unfiltered traversal:
g.V().hasLabel("post").has("isActive", true).limit(40)
execution time: 1 ms
index ops: 49
By the time we build out the actual traversal we run in production there are around 12 filtering steps and 4 by()
step-modulators causing the traversal to take over 6000 ms to complete with over 33000 index ops. Removing the order().by()
steps causes the same traversal to run fast (500 ms).
The issue seems to be with order().by()
and the number of index ops required to sort. I have seen the performance issue noted here but adding barrier()
did not help. The traversal is also fully optimized requiring no Tinkerpop conversion.
I am running engine version 1.1.0.0 R1. There are about 5000 post
vertices.
How can I improve the performance of this traversal?