I have an issue with a request performance in Neptune. I have a graph like this :
hasId('A_id') (count: 1) -> out('has_group') (count: 12) -> out('has_class').hasLabel('C') (count: 9751) -> out('has_type').hasLabel('D') (count: 9749) -> out('has_element') (count: 472370) -> hasLabel(Within(11 elements label)) (count: 107233)
I replace all the label to simplify, but the graph is exactly like this. The within decrease a lot the performance of the query. For more details :
g.V('0f7a21df-9413-4c71-99f3-242ae25356a5').out('has_group').out('has_class').hasLabel('C').out('has_type').hasLabel('D').out('has_element').count()
This request take less than 1 secondes and return 472370.
If I add the last hasLabel(whithin()) like this :
g.V('0f7a21df-9413-4c71-99f3-242ae25356a5').out('has_group').out('has_class').hasLabel('C').out('has_type').hasLabel('D').out('has_element').hasLabel(P.within('element_1','element_2','element_3','element_4','element_5','element_6','element_7','element_8','element_9','element_10','element_11')).count()
The time decrease to 18 seconds. And when I profile the query we can see a joinTime which take at least 90% of the query execution time on the within :
Optimized Traversal
===================
Neptune steps: [
NeptuneCountGlobalStep {
JoinGroupNode {
PatternNode[(?1=<0f7a21df-9413-4c71-99f3-242ae25356a5>, ?5=<has_group>, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .
],
{estimatedCardinality=12, expectedTotalOutput=12, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=12
}
PatternNode[(?3, ?9=<has_class>, ?7, ?10) . project ?3,?7 . IsEdgeIdFilter(?10) .
],
{estimatedCardinality=208482, expectedTotalOutput=2000, indexTime=0, joinTime=9, numSearches=1, actualTotalOutput=10945
}
PatternNode[(?7, <~label>, ?8=<C>, <~>) . project ask .
],
{estimatedCardinality=424296, expectedTotalOutput=2000, indexTime=5, joinTime=111, numSearches=10945, actualTotalOutput=9751
}
PatternNode[(?7, ?13=<has_type>, ?11, ?14) . project ?7,?11 . IsEdgeIdFilter(?14) .
],
{estimatedCardinality=9675934, expectedTotalOutput=11695, indexTime=15, joinTime=95, numSearches=10, actualTotalOutput=42226
}
PatternNode[(?11, <~label>, ?12=<D>, <~>) . project ask .
],
{estimatedCardinality=2333386, expectedTotalOutput=11695, indexTime=23, joinTime=402, numSearches=42226, actualTotalOutput=9749
}
PatternNode[(?11, ?17=<has_element>, ?15, ?18) . project ?11,?15 . IsEdgeIdFilter(?18) .
],
{estimatedCardinality=8562896, expectedTotalOutput=556904, indexTime=18, joinTime=442, numSearches=10, actualTotalOutput=472370
}
PatternNode[(?15, <~label>, ?16, <~>) . project ask . ContainsFilter(?16 in (<element_1>, <element_2>, <element_3>, <element_4>, <element_5>, <element_6>, <element_7>, <element_8>, <element_9>, <element_10>, <element_11>)) .
],
{estimatedCardinality=1158922, indexTime=598, joinTime=18991, numSearches=472370
}
}, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, Vertex(?7):VertexStep, Vertex(?11):VertexStep, Vertex(?15):VertexStep
], joinStats=true, optimizationTime=1, maxVarId=19, executionTime=20799
}
}
]
The request seems quite simple and the volume not that much. The within is not the good approach here ? Do you have some clue to improve the query ?
EDIT 1:
I tried with the request provided by @saikiranboga and the number of index operation is large better (divided by 10) but the join time is still high. I'm quite confuse.
The index operation number before :
Index Operations
================
Query execution:
# of statement index ops: 525563
# of unique statement index ops: 525563
Duplication ratio: 1.0
# of terms materialized: 0
and after
Index Operations
================
Query execution:
# of statement index ops: 53666
# of unique statement index ops: 53666
Duplication ratio: 1.0
# of terms materialized: 0
Optimized Traversal
===================
Neptune steps: [
NeptuneCountGlobalStep {
JoinGroupNode {
PatternNode[(?1=<0f7a21df-9413-4c71-99f3-242ae25356a5>, ?5=<has_group>, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .
],
{estimatedCardinality=12, expectedTotalOutput=12, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=12
}
PatternNode[(?3, ?9=<has_class>, ?7, ?10) . project ?3,?7 . IsEdgeIdFilter(?10) .
],
{estimatedCardinality=208000, expectedTotalOutput=2000, indexTime=0, joinTime=10, numSearches=1, actualTotalOutput=10945
}
PatternNode[(?7, <~label>, ?8=<C>, <~>) . project ask .
],
{estimatedCardinality=424296, expectedTotalOutput=2000, indexTime=4, joinTime=102, numSearches=10945, actualTotalOutput=9751
}
PatternNode[(?7, ?13=<has_type>, ?11, ?14) . project ?7,?11 . IsEdgeIdFilter(?14) .
],
{estimatedCardinality=9456689, expectedTotalOutput=11695, indexTime=13, joinTime=94, numSearches=10, actualTotalOutput=42226
}
PatternNode[(?11, <~label>, ?12=<D>, <~>) . project ask .
],
{estimatedCardinality=2333386, expectedTotalOutput=11695, indexTime=17, joinTime=341, numSearches=42226, actualTotalOutput=9749
}
PatternNode[(?11, ?17=<has_element>, ?15, ?18) . project ?11,?15 . IsEdgeIdFilter(?18) .
],
{estimatedCardinality=7919022, expectedTotalOutput=556904, indexTime=17, joinTime=411, numSearches=10, actualTotalOutput=472370
}
PatternNode[(?15, <~label>, ?16, <~>) . project ?16 . ContainsFilter(?16 in (<element_1>, <element_2>, <element_3>, <element_4>, <element_5>, <element_6>, <element_7>, <element_8>, <element_9>, <element_10>, <element_11>)) .
],
{estimatedCardinality=1145096, indexTime=848, joinTime=15268, numSearches=473
}
}, finishers=[dedup(?15)
], annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, Vertex(?7):VertexStep, Vertex(?11):VertexStep, Vertex(?15):VertexStep@[element
], VertexLabel(?16):LabelStep
], joinStats=true, optimizationTime=1, maxVarId=19, executionTime=17283
}
}
]