0

I'm learning graph databases by building a simple MLM network (basically a user can sponsor another user, and all users have at most one sponsor). I want to run a query that:

  • Go from a selected user to another user, until a certain predicate is satisfied - then sum the points of all users along the selected paths into a value (this value should be deduped to prevent double counting when a user branches out to multiple users).
  • Repeat this step 3 times, but each time start from the last user that was reached in the previous step.
  • Output the sums as a list.

I've been trying the following query:

    g.V(userID)
     .repeat(
       repeat(out('sponsors')
         .until(somePredicate)
         .out('hasPoints')
         .as('level') // How do I know the current loop iteration so I can store level1/level2/level3 in as step dynamically?
         // This is where I'm stuck, since I have no idea how to capture and sum all the points in this subtree.
         .in('hasPoints')
     )
     .times(3)
     // Also need to output the point sums as a list/map here, e.g. ["level1": 100, "level2": 100],
     // "level1" being the first iteration of repeat and so on.

Any pointer?

EDIT:

Here's a Gremlin script for sample data:

g.addV('user').property('id', 1).as('1')
  addV('user').property('id', 2).as('2').
  addV('user').property('id', 3).as('3').
  addV('user').property('id', 4).as('4').
  addV('user').property('id', 5).as('5').
  addV('user').property('id', 6).as('6').
  addV('user').property('id', 7).as('7').
  addV('point').property('value', 5).as('p1')
  addV('point').property('value', 5).as('p2').
  addV('point').property('value', 5).as('p3').
  addV('point').property('value', 5).as('p4').
  addV('point').property('value', 5).as('p5').
  addV('point').property('value', 5).as('p6').
  addV('point').property('value', 5).as('p7').
  addE('sponsors').from('1').to('2').
  addE('sponsors').from('1').to('3').
  addE('sponsors').from('1').to('4').
  addE('sponsors').from('2').to('5').
  addE('sponsors').from('3').to('6').
  addE('sponsors').from('4').to('7').
  addE('hasPoints').from('1').to('p1').
  addE('hasPoints').from('2').to('p2').
  addE('hasPoints').from('3').to('p3').
  addE('hasPoints').from('4').to('p4').
  addE('hasPoints').from('5').to('p5').
  addE('hasPoints').from('6').to('p6').
  addE('hasPoints').from('7').to('p7').
  iterate()

This is a query that I'm writing to group levels together based on some predicate:

g.V()
    .has('id', 1)
    .repeat('x',
        identity()
            .repeat(
                out('sponsors')
                    .choose(loops('x'))
                    .option(0, identity().as('a1'))
                    .option(1, identity().as('a2'))
                    .option(2, identity().as('a3'))
            )
            .until(or(out('hasPoints').has('value', gte(5))))
            .sideEffect(
                choose(loops('x'))
                    .option(0, select(all, 'a1'))
                    .option(1, select(all, 'a2'))
                    .option(2, select(all, 'a3'))
                    .unfold()
                    .choose(loops('x'))
                    .option(0, store('b1'))
                    .option(1, store('b2'))
                    .option(2, store('b3'))
            )
    )
    .times(3)
    .cap('b1', 'b2', 'b3')

Even though I can manually set the variables and choose the correct variables, I don't know how to do this dynamically yet - i.e. instead of times(3) there might be a situation where I need it to be until, so the iteration count is no longer known beforehand.

Hai Pham
  • 55
  • 4
  • Could you please provide a Gremlin script that creates some sample data - here is an example https://stackoverflow.com/questions/51388315/gremlin-choose-one-item-at-random – stephen mallette Apr 15 '20 at 10:59
  • @stephenmallette I have provided the sample data :) – Hai Pham Apr 15 '20 at 17:04
  • thanks for the data. you show in your updated question you're trying to "group levels together" but that sounds like a different goal from what you wrote in your original question toward the top. if you just need to group levels together that's not too difficult and can be done in the dynamic fashion you're looking for without the complexity that you currently have. could you please clarify what you're looking for in an answer? – stephen mallette Apr 15 '20 at 19:00
  • Hi @stephenmallette, the grouping of levels and dynamic setting of variables are what I’m looking for. I did not show in the query but I also need to sum the points in the grouped levels together, instead of the HashMap output by cap :) – Hai Pham Apr 16 '20 at 01:38

1 Answers1

0

I've modified your data slightly to include a single "point" value less than 5 to prove that it was filtering properly and changed the "id" property to T.id so that results were easier to read while I was testing things:

g.addV('user').property(id, 1).as('1').
  addV('user').property(id, 2).as('2').
  addV('user').property(id, 3).as('3').
  addV('user').property(id, 4).as('4').
  addV('user').property(id, 5).as('5').
  addV('user').property(id, 6).as('6').
  addV('user').property(id, 7).as('7').
  addV('point').property('value', 5).as('p1').
  addV('point').property('value', 5).as('p2').
  addV('point').property('value', 5).as('p3').
  addV('point').property('value', 5).as('p4').
  addV('point').property('value', 5).as('p5').
  addV('point').property('value', 4).as('p6').
  addV('point').property('value', 5).as('p7').
  addE('sponsors').from('1').to('2').
  addE('sponsors').from('1').to('3').
  addE('sponsors').from('1').to('4').
  addE('sponsors').from('2').to('5').
  addE('sponsors').from('3').to('6').
  addE('sponsors').from('4').to('7').
  addE('hasPoints').from('1').to('p1').
  addE('hasPoints').from('2').to('p2').
  addE('hasPoints').from('3').to('p3').
  addE('hasPoints').from('4').to('p4').
  addE('hasPoints').from('5').to('p5').
  addE('hasPoints').from('6').to('p6').
  addE('hasPoints').from('7').to('p7').
  iterate()

If you just need to group dynamically based on the level iterated by repeat() then you can just group() on loops():

gremlin> g.V(1).
......1>   repeat(out('sponsors').
......2>          group('m').
......3>            by(loops()).
......4>            by(out('hasPoints').has('value',gte(5)).
......5>               values('value').sum())).
......6>   cap('m')
==>[0:15,1:10]

You mention that you'd like those values summed, which you can do, easily enough as:

gremlin> g.V(1).
......1>   repeat(out('sponsors').
......2>          group('m').
......3>            by(loops()).
......4>            by(out('hasPoints').has('value',gte(5)).
......5>               values('value').sum())).
......6>   cap('m').
......7>   unfold().
......8>   select(values).
......9>   sum()
==>25

Of course if you just need the total you can avoid group() completely:

gremlin> g.V(1).
......1>   repeat(out('sponsors').
......2>          store('m').
......3>            by(coalesce(out('hasPoints').has('value',gte(5)).values('value'), 
......4>                        constant(0)))).
......5>   cap('m').
......6>   sum(local)
==>25

Finally, if we no longer care about levels then we can probably go one better and get rid the side-effect of "m" completely and save that overhead:

gremlin> g.V(1).
......1>   repeat(out('sponsors')).
......2>     emit().
......3>   out('hasPoints').has('value',gte(5)).
......4>   values('value'). 
......5>   sum()
==>25
stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • Hi @stephenmallette, your query gave me some ideas to tweak to my case. I will accept this answer :) I notice that there's no times or until in the repeat block, may I know why? The documentation doesn't state that it is possible to omit them. – Hai Pham Apr 16 '20 at 12:52
  • if you don't supply an end condition `repeat()` will run until it exhaust the child traversal supplied to it. That's a bit dangerous because with graphs you could easily stumble onto a cycle in the graph continuing indefinitely. In this case, I didn't bother to add an end condition because I knew the data well (small static toy graph) and it contained no cycles. In practice, even if you know the data does not contain cycles it's probably smart to include some kind of break condition for safety. As we don't recommend not having a break condition we probably didn't document it as an option. – stephen mallette Apr 16 '20 at 13:05
  • Got it. It'd be great if you could offer some intuition into these problems, because right now I'm finding some concepts in Gremlin a bit hard to grasp :) – Hai Pham Apr 16 '20 at 13:37