0

vehicles --> accounts --> organizations <-- users

We have the above graph structure where vechicles , accounts, organizations and users are vertex labels and the arrows indicate the edge direction.

Consider the following number of vertices :

organizations = 1
accounts per organizations = 2
vehciles per account = 5000
users per organizations = 100

Our requirement is , given two vertexIds , find a set of all users and vehicles that satisfy the above graph.

For example if I have vertex1 = accounts:1 and vertex2 = organizations:1 , find the set of users and vehicles that are part of these two vertices.

We have the following query

g.V('accounts:1').outE().otherV().hasId('organizations:1')
.V('accounts:1').inE().otherV().as('B')
.V('organizations:1').inE().otherV().as('A')
.select('A', 'B')

While this works , the query takes ~3.5 seconds to complete , now we know that there are going to be 500000 traversers for this query.

Is there a better way to do this ?

Thanks for the help

Edit #1 : Attaching the query's profile API response

  Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(VertexId)@[A, B] {
        JoinGroupNode {
            JoinGroupNode {
                PatternNode[(?1=<accounts:1>, <lifestate>, "ACTIVE", ?) . project ?1 .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=1}
                PatternNode[(?1, ?5, ?3=<organizations:1>, ?6) . project ?1,?6,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=102, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
                PatternNode[(?6, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1, indexTime=0, joinTime=2, numSearches=1, actualTotalOutput=1}
                PatternNode[(?3, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=2, numSearches=1, actualTotalOutput=1}
            }, finishers=[dedup(?3)]
            PatternNode[(?8=<organizations:1>, <~label>, ?9, <~>) . project distinct ?8 .], {estimatedCardinality=INFINITY, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
            PatternNode[(?8, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
            PatternNode[(?10, ?12, ?8, ?13) . project ?8,?13,?10 . IsEdgeIdFilter(?13) .], {estimatedCardinality=INFINITY, expectedTotalOutput=102, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=102}
            PatternNode[(?13, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=102, indexTime=0, joinTime=128, numSearches=102, actualTotalOutput=102}
            PatternNode[(?13, <role>, "admin", ?) . project ask .], {estimatedCardinality=113376, expectedTotalOutput=100, indexTime=1, joinTime=6, numSearches=102, actualTotalOutput=100}
            PatternNode[(?10, <~label>, ?14=<users>, <~>) . project ask .], {estimatedCardinality=2326404, expectedTotalOutput=100, indexTime=0, joinTime=128, numSearches=100, actualTotalOutput=100}
            PatternNode[(?10, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100, indexTime=0, joinTime=83, numSearches=100, actualTotalOutput=100}
            PatternNode[(?10, <~label>, ?15=<users>, <~>) . project ?10 .], {estimatedCardinality=2326404, expectedTotalOutput=100, indexTime=1, joinTime=1, numSearches=1, actualTotalOutput=100}
            PatternNode[(?16=<accounts:1>, <~label>, ?17, <~>) . project distinct ?16 .], {estimatedCardinality=INFINITY, expectedTotalOutput=100, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=100}
            PatternNode[(?16, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=100}
            PatternNode[(?18, ?20, ?16, ?21) . project ?16,?21,?18 . IsEdgeIdFilter(?21) .], {estimatedCardinality=INFINITY, expectedTotalOutput=1000, indexTime=0, joinTime=119, numSearches=1, actualTotalOutput=500000}
            PatternNode[(?21, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1000, indexTime=194, joinTime=142, numSearches=5000, actualTotalOutput=500000}
            PatternNode[(?18, <~label>, ?22=<vehicles>, <~>) . project ask .], {estimatedCardinality=238260, expectedTotalOutput=1000, indexTime=183, joinTime=499, numSearches=5000, actualTotalOutput=500000}
            PatternNode[(?18, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1000, indexTime=193, joinTime=858, numSearches=5000, actualTotalOutput=500000}
            PatternNode[(?18, <~label>, ?23=<vehicles>, <~>) . project ?18 .], {estimatedCardinality=238260, indexTime=360, joinTime=1372, numSearches=500}
        }, annotations={path=[Vertex(?1):GraphStep, Edge(?6):VertexStep, Vertex(?3):EdgeOtherVertexStep, Vertex(?8):GraphStep, Edge(?13):VertexStep, Vertex(?10):EdgeOtherVertexStep, VertexId(?10):IdStep@[A], Vertex(?16):GraphStep, Edge(?21):VertexStep, Vertex(?18):EdgeOtherVertexStep, VertexId(?18):IdStep@[B]], joinStats=true, optimizationTime=329, maxVarId=24, executionTime=6279}
    },
    NeptuneTraverserConverterStep
]
+ not converted into Neptune steps: [SelectStep(last,[A, B])]

WARNING: >> SelectStep(last,[A, B]) << (or one of its children) is not supported natively yet

Physical Pipeline
=================
NeptuneGraphQueryStep@[A, B]
    |-- StartOp
    |-- JoinGroupOp
        |-- JoinGroupOp
            |-- SpoolerOp(1000)
            |-- DynamicJoinOp(PatternNode[(?1=<accounts:1>, <lifestate>, "ACTIVE", ?) . project ?1 .], {estimatedCardinality=1799504, expectedTotalOutput=1})
            |-- SpoolerOp(1000)
            |-- DynamicJoinOp(PatternNode[(?1, ?5, ?3=<organizations:1>, ?6) . project ?1,?6,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=102, expectedTotalOutput=1})
            |-- SpoolerOp(1000)
            |-- DynamicJoinOp(PatternNode[(?6, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1})
            |-- SpoolerOp(1000)
            |-- DynamicJoinOp(PatternNode[(?3, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1})
            |-- FilterOp
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?8=<organizations:1>, <~label>, ?9, <~>) . project distinct ?8 .], {estimatedCardinality=INFINITY, expectedTotalOutput=1})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?8, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?10, ?12, ?8, ?13) . project ?8,?13,?10 . IsEdgeIdFilter(?13) .], {estimatedCardinality=INFINITY, expectedTotalOutput=102})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?13, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=102})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?13, <role>, "admin", ?) . project ask .], {estimatedCardinality=113376, expectedTotalOutput=100})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?10, <~label>, ?14=<users>, <~>) . project ask .], {estimatedCardinality=2326404, expectedTotalOutput=100})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?10, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?10, <~label>, ?15=<users>, <~>) . project ?10 .], {estimatedCardinality=2326404, expectedTotalOutput=100})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?16=<accounts:1>, <~label>, ?17, <~>) . project distinct ?16 .], {estimatedCardinality=INFINITY, expectedTotalOutput=100})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?16, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?18, ?20, ?16, ?21) . project ?16,?21,?18 . IsEdgeIdFilter(?21) .], {estimatedCardinality=INFINITY, expectedTotalOutput=1000})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?21, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1000})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?18, <~label>, ?22=<vehicles>, <~>) . project ask .], {estimatedCardinality=238260, expectedTotalOutput=1000})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?18, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1000})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?18, <~label>, ?23=<vehicles>, <~>) . project ?18 .], {estimatedCardinality=238260})

Runtime (ms)
============
Query Execution: 6283.262
Serialization:   2120.104

Traversal Metrics
=================
Step                                                               Count  Traversers       Time (ms)    % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(VertexId)@[A, obje...                500000      500000        2502.636    41.43
NeptuneTraverserConverterStep                                     500000      500000        2580.098    42.71
SelectStep(last,[A, B])                              500000      500000         958.328    15.86
                                            >TOTAL                     -           -        6041.062        -

Predicates
==========
# of predicates: 37

WARNING: reverse traversal with no edge label(s) - .in() / .both() may impact query performance

Results
=======
Count: 500000
Output: <Removed for space>
Response serializer: application/vnd.gremlin-v3.0+gryo
Response size (bytes): 64,000,045


Index Operations
================
Query execution:
    # of statement index ops: 15915
    # of unique statement index ops: 15915
    Duplication ratio: 1.0
    # of terms materialized: 0
Serialization:
    # of statement index ops: 0
    # of terms materialized: 0
zXor
  • 208
  • 1
  • 10

1 Answers1

0

If possible always provide labels on traversal steps like in() and out(). Also, you do not need to specify inE().otherV() unless you need data from the edge. in() will suffice. As a first step I would try:

g.V('accounts:1').out(<labels>).hasId('organizations:1')
 .V('accounts:1').in(<labels>).as('B')
 .V('organizations:1').in(<labels>).as('A')
.select('A', 'B')

Where <labels> will be of the form in('works-with','knows').

Using edge labels, especially on the in steps can help a lot in some cases. I would start there as a first step. There are other rewrites that can be tried but this is a good first step.

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • Kelvin , in our case unfortunately all edge labels are the same "in". So providing edge labels here did not help us – zXor Feb 24 '22 at 15:10
  • Does the first line of your query yield more than one result? If it does, the `V` steps that follow will fan out that number of times. Is that really the behavior you want? – Kelvin Lawrence Feb 24 '22 at 15:15
  • No , g.V('accounts:1').out().hasId('organizations:1') is always going to give a single result , vertexIds are unique – zXor Feb 24 '22 at 15:19
  • Ok so you are just using it as a filter to make sure that that relationship exists? The next step is probably to get a `/profile` of the query if possible. I don't know if it will help but you could try replacing that second `V` with a `select` `Or try g.V('accounts:1').where(out().hasId('organizations:1'))` instead, and get rid of the second `V` completely. – Kelvin Lawrence Feb 24 '22 at 15:34
  • That unfortunately did not help Kelvin , I have edited the question with the profile() response – zXor Feb 24 '22 at 16:01
  • Actually I meant the Neptune /profile endpoint - sorry if that was not clear. – Kelvin Lawrence Feb 24 '22 at 20:15
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/242404/discussion-between-zxor-and-kelvin-lawrence). – zXor Feb 25 '22 at 13:43