Cosmos gremlin performance issue with query

Question

I am trying to run this query below but get a timeout issue. Where is the inefficiency here?

   g.V().hasLabel('RiskLibrary','name','General 
Business','active','1').as('lib').select('lib').
outE('CONTAINS_RISK').select('lib').project('Risk 
Library','Risks').by('name').by(out('CONTAINS_RISK').project('Name',     
'Description','Impacts','Causes').by('name').by('description').
by(both('IMPACTS').project('name').by('name').fold()).
by(both('CAUSES').project('name').by('name').fold()).fold())

score 1 · Answer 1 · answered Mar 24 '20 at 10:55

Something about your traversal seems incorrect. For starters hasLabel() looks like it's being called with arguments that aren't labels. I assume the traversal should be:

g.V().has('RiskLibrary', 'name', 'General Business').has('active', '1').as('lib').
      select('lib').
      outE('CONTAINS_RISK').
      select('lib').
      project('Risk Library', 'Risks').
        by('name').
        by(out('CONTAINS_RISK').
           project('Name', 'Description', 'Impacts', 'Causes').
             by('name').
             by('description').
             by(both('IMPACTS').
                project('name').by('name').
                fold()).
             by(both('CAUSES').
                project('name').by('name').
                fold()).
             fold())

If that is correct then I'd wonder what the purpose of the traversal is exactly. As it stands right now I would expect it to project('Risk Library', 'Risks') for every single outE('CONTAINS_RISK'). You will get the same output for each of those edges as you select('lib') which grabs the original Vertex from which you traversed from. If you didn't intend that, you could imagine some significant query cost for performing the hefty project('Risk Library', 'Risks') over and over and over again.

Assuming the rest of your traversal is correct, I think you just need to get rid of the step label of as('lib') and the lines up to project('Risk Library', 'Risks'), thus:

g.V().has('RiskLibrary', 'name', 'General Business').has('active', '1').
      project('Risk Library', 'Risks').
        by('name').
        by(out('CONTAINS_RISK').
           project('Name', 'Description', 'Impacts', 'Causes').
             by('name').
             by('description').
             by(both('IMPACTS').
                project('name').by('name').
                fold()).
             by(both('CAUSES').
                project('name').by('name').
                fold()).
             fold())

That said, this may yet be an expensive traversal to execute depending on how many "IMPACTS" and "CAUSES" edges exist for each opposing vertex of the "CONTAINS_RISK" edge.

Thanks Stephen. So, a final question, how do we filter based on a property of the 'CONTAINS_RISK' edge? — Silvertooth, Mar 24 '20 at 23:51
change `out('CONTAINS_RISK')` to `outE('CONTAINS_RISK').has('property', value).inV()` — stephen mallette, Mar 24 '20 at 23:52

Cosmos gremlin performance issue with query

1 Answers1