vehicles --> accounts --> organizations <-- users
We have the above graph structure where vechicles , accounts, organizations and users are vertex labels and the arrows indicate the edge direction.
Consider the following number of vertices :
organizations = 1
accounts per organizations = 2
vehciles per account = 5000
users per organizations = 100
Our requirement is , given two vertexIds , find a set of all users and vehicles that satisfy the above graph.
For example if I have vertex1 = accounts:1 and vertex2 = organizations:1 , find the set of users and vehicles that are part of these two vertices.
We have the following query
g.V('accounts:1').outE().otherV().hasId('organizations:1')
.V('accounts:1').inE().otherV().as('B')
.V('organizations:1').inE().otherV().as('A')
.select('A', 'B')
While this works , the query takes ~3.5 seconds to complete , now we know that there are going to be 500000 traversers for this query.
Is there a better way to do this ?
Thanks for the help
Edit #1 : Attaching the query's profile API response
Optimized Traversal
===================
Neptune steps:
[
NeptuneGraphQueryStep(VertexId)@[A, B] {
JoinGroupNode {
JoinGroupNode {
PatternNode[(?1=<accounts:1>, <lifestate>, "ACTIVE", ?) . project ?1 .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=1}
PatternNode[(?1, ?5, ?3=<organizations:1>, ?6) . project ?1,?6,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=102, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
PatternNode[(?6, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1, indexTime=0, joinTime=2, numSearches=1, actualTotalOutput=1}
PatternNode[(?3, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=2, numSearches=1, actualTotalOutput=1}
}, finishers=[dedup(?3)]
PatternNode[(?8=<organizations:1>, <~label>, ?9, <~>) . project distinct ?8 .], {estimatedCardinality=INFINITY, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
PatternNode[(?8, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
PatternNode[(?10, ?12, ?8, ?13) . project ?8,?13,?10 . IsEdgeIdFilter(?13) .], {estimatedCardinality=INFINITY, expectedTotalOutput=102, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=102}
PatternNode[(?13, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=102, indexTime=0, joinTime=128, numSearches=102, actualTotalOutput=102}
PatternNode[(?13, <role>, "admin", ?) . project ask .], {estimatedCardinality=113376, expectedTotalOutput=100, indexTime=1, joinTime=6, numSearches=102, actualTotalOutput=100}
PatternNode[(?10, <~label>, ?14=<users>, <~>) . project ask .], {estimatedCardinality=2326404, expectedTotalOutput=100, indexTime=0, joinTime=128, numSearches=100, actualTotalOutput=100}
PatternNode[(?10, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100, indexTime=0, joinTime=83, numSearches=100, actualTotalOutput=100}
PatternNode[(?10, <~label>, ?15=<users>, <~>) . project ?10 .], {estimatedCardinality=2326404, expectedTotalOutput=100, indexTime=1, joinTime=1, numSearches=1, actualTotalOutput=100}
PatternNode[(?16=<accounts:1>, <~label>, ?17, <~>) . project distinct ?16 .], {estimatedCardinality=INFINITY, expectedTotalOutput=100, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=100}
PatternNode[(?16, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=100}
PatternNode[(?18, ?20, ?16, ?21) . project ?16,?21,?18 . IsEdgeIdFilter(?21) .], {estimatedCardinality=INFINITY, expectedTotalOutput=1000, indexTime=0, joinTime=119, numSearches=1, actualTotalOutput=500000}
PatternNode[(?21, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1000, indexTime=194, joinTime=142, numSearches=5000, actualTotalOutput=500000}
PatternNode[(?18, <~label>, ?22=<vehicles>, <~>) . project ask .], {estimatedCardinality=238260, expectedTotalOutput=1000, indexTime=183, joinTime=499, numSearches=5000, actualTotalOutput=500000}
PatternNode[(?18, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1000, indexTime=193, joinTime=858, numSearches=5000, actualTotalOutput=500000}
PatternNode[(?18, <~label>, ?23=<vehicles>, <~>) . project ?18 .], {estimatedCardinality=238260, indexTime=360, joinTime=1372, numSearches=500}
}, annotations={path=[Vertex(?1):GraphStep, Edge(?6):VertexStep, Vertex(?3):EdgeOtherVertexStep, Vertex(?8):GraphStep, Edge(?13):VertexStep, Vertex(?10):EdgeOtherVertexStep, VertexId(?10):IdStep@[A], Vertex(?16):GraphStep, Edge(?21):VertexStep, Vertex(?18):EdgeOtherVertexStep, VertexId(?18):IdStep@[B]], joinStats=true, optimizationTime=329, maxVarId=24, executionTime=6279}
},
NeptuneTraverserConverterStep
]
+ not converted into Neptune steps: [SelectStep(last,[A, B])]
WARNING: >> SelectStep(last,[A, B]) << (or one of its children) is not supported natively yet
Physical Pipeline
=================
NeptuneGraphQueryStep@[A, B]
|-- StartOp
|-- JoinGroupOp
|-- JoinGroupOp
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?1=<accounts:1>, <lifestate>, "ACTIVE", ?) . project ?1 .], {estimatedCardinality=1799504, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?1, ?5, ?3=<organizations:1>, ?6) . project ?1,?6,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=102, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?6, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?3, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1})
|-- FilterOp
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?8=<organizations:1>, <~label>, ?9, <~>) . project distinct ?8 .], {estimatedCardinality=INFINITY, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?8, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?10, ?12, ?8, ?13) . project ?8,?13,?10 . IsEdgeIdFilter(?13) .], {estimatedCardinality=INFINITY, expectedTotalOutput=102})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?13, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=102})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?13, <role>, "admin", ?) . project ask .], {estimatedCardinality=113376, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?10, <~label>, ?14=<users>, <~>) . project ask .], {estimatedCardinality=2326404, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?10, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?10, <~label>, ?15=<users>, <~>) . project ?10 .], {estimatedCardinality=2326404, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?16=<accounts:1>, <~label>, ?17, <~>) . project distinct ?16 .], {estimatedCardinality=INFINITY, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?16, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?18, ?20, ?16, ?21) . project ?16,?21,?18 . IsEdgeIdFilter(?21) .], {estimatedCardinality=INFINITY, expectedTotalOutput=1000})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?21, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1000})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?18, <~label>, ?22=<vehicles>, <~>) . project ask .], {estimatedCardinality=238260, expectedTotalOutput=1000})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?18, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1000})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?18, <~label>, ?23=<vehicles>, <~>) . project ?18 .], {estimatedCardinality=238260})
Runtime (ms)
============
Query Execution: 6283.262
Serialization: 2120.104
Traversal Metrics
=================
Step Count Traversers Time (ms) % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(VertexId)@[A, obje... 500000 500000 2502.636 41.43
NeptuneTraverserConverterStep 500000 500000 2580.098 42.71
SelectStep(last,[A, B]) 500000 500000 958.328 15.86
>TOTAL - - 6041.062 -
Predicates
==========
# of predicates: 37
WARNING: reverse traversal with no edge label(s) - .in() / .both() may impact query performance
Results
=======
Count: 500000
Output: <Removed for space>
Response serializer: application/vnd.gremlin-v3.0+gryo
Response size (bytes): 64,000,045
Index Operations
================
Query execution:
# of statement index ops: 15915
# of unique statement index ops: 15915
Duplication ratio: 1.0
# of terms materialized: 0
Serialization:
# of statement index ops: 0
# of terms materialized: 0