1

I've the following model which I'd like to model as a graph in Azure CosmmodDB. enter image description here

So I have a user that can be in multiple groups, user can also have multiple permissions attached, groups can also have multiple permissions attached. I want to find an efficient query that starting from User, I get all the permissions attached (either directly attached or via a group). One thing to add is that user and group may be assigned to the same permission (and I want to get it just once). I came up with the query:

 g.V().hasLabel('user').has('userid', '0_2147483647').repeat(out().simplePath()).until(hasLabel('permission'))

This query is not very efficient when there is much data, so the question is: can we make it better ?

macpak
  • 1,190
  • 1
  • 14
  • 28

1 Answers1

2

I don't see a reason to use repeat() here as the depth of your traversal is known. I would just do:

g.V().has('user`, 'userid', '0_2147483647').
  union(out('has'),
        out('isingroup').out('has')).
  dedup()
stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • Will it change anything (apart from readability of the query) ? It's still the same number of edges/vertexes to traverse. – macpak Jun 17 '22 at 06:28
  • i preach readability with Gremlin pretty heavily as there are patterns for certain types of traversing and if you see those patterns you expect certain graph structure. I also think Gremlin that fails the readability test tend to be hard to maintain and often don't perform as well. Preaching aside, I think that latter point of performance applies here, though whether it is noticeable to you or your users depends on how much data you have and how the graph you are using chooses to optimize the `repeat()`. – stephen mallette Jun 17 '22 at 10:31
  • Typically, I'd expect my approach to be faster as (1) yours requires `simplePath()` which has to analyze path history which requires more memory and an added filter and (2) unless the `repeat()` is optimized to unfold it typically tends to be slower. In other words, `g.V().repeat(out()).times(2)` is slower than `g.V().out().out()`. – stephen mallette Jun 17 '22 at 10:34