0

I'm studying AWS Neptune with Gremlin to build a permission system. This system would have basically 3 types of vertices: Users, Permissions and Groups.

  • A group has 0..n permissions
  • A user can have 0..n groups
  • A user can be directly connected to 0..n permissions
  • A user can be connected to another user, in which case it "inheritates" that user permission's
  • A group can be inside of another group, that is inside of another group.... so on.

I'm looking for a performant query to find all permissions for a given user.

This graph may get really huge so to stress it out I have build a 17kk user vertices graph, created 10 random edges for each one of them and then created a few permissions.

Then the query I was using to get all permissions is obviouly running forever... n_n'

What I'm trying is simply:

g.V('u01')
    .repeat(out())
    .until(hasLabel('Permission'))
    .simplePath()

Is there a better query to achieve it? Or maybe even a better modeling for this scenario?

I was thinking that maybe my 10 random edges have created a lot of cycles and connections that "make no sense" and thats why the query is slow. Does it make sense?

Thanks in advance!

João Menighin
  • 3,083
  • 6
  • 38
  • 80

1 Answers1

0

You probably running in circles. You should write it like this:

g.V('u01')
    .repeat(out().simplePath())
    .until(hasLabel('Permission'))

It is also preferable the use specific label in the out step to avoid traversing irrelevant paths.

Kfir Dadosh
  • 1,411
  • 9
  • 9
  • Hi, I took so long to test because I had to destroy that graph and recreate it. Actually I recreated a little different. Still the same 17kk *user* vertices but with 3 edges for each node to a random *user* (so, more or less 51kk edges). I tried with the `simplePath()` but still got timeout. Any other ideas? `gremlin> g.V('u3').repeat(out().simplePath()).until(hasLabel('Permission')) {"requestId":"b5fd0ed2-f865-4c5f-ad3f-e748c1696258","code":"TimeLimitExceededException","detailedMessage":"A timeout occurred within the script during evaluation."}` – João Menighin Dec 19 '19 at 12:36
  • Since it is a random generated graph, it is hard to tell where is the problem. Try to limit the number of hops using `loops().is(eq(4))` inside the `until`. Add `count()` to the end of the query to get only the number of results and save the heavy serialization. You can also add `profile()` to get an idea of the number of traversals and the time each is taking. – Kfir Dadosh Dec 19 '19 at 22:32