How to calculate aggregates for sub-trees in Gremlin?

Question

I have a tree with many levels, where leaf nodes might have property "count". I want to calculate total count for each sub-tree, and cache those values in the root node of each sub-tree. Is that possible in Gremlin?

which version of Gremlin 2.x or 3.x? – stephen mallette Sep 28 '15 at 11:31 — stephen mallette, Sep 28 '15 at 11:31
Gremlin 2.x would be preferable. – isobretatel Sep 28 '15 at 15:06 — isobretatel, Sep 28 '15 at 15:06

stephen mallette · Accepted Answer · 2015-09-29T16:19:35.310

4

You could do it with a sideEffect - that's pretty straightforward. We setup a simple tree with:

gremlin> g = new TinkerGraph()                                                                 
==>tinkergraph[vertices:0 edges:0]
gremlin> v1 = g.addVertex()                                                                    
==>v[0]
gremlin> v2 = g.addVertex()                                                                    
==>v[1]
gremlin> v3 = g.addVertex([count:2])                                                           
==>v[2]
gremlin> v4 = g.addVertex([count:3])                                                           
==>v[3]
gremlin> v1.addEdge('child',v2)                                                                
==>e[4][0-child->1]
gremlin> v1.addEdge('child',v3)                                                                
==>e[5][0-child->2]
gremli                                                                                         
gremlin> v2.addEdge('child',v4)
==>e[6][1-child->3]

And then here's the calculation over each subtree within the full tree:

gremlin> g.V().filter{it.outE().hasNext()}.sideEffect{                                           
gremlin>   c=0;                                                                                  
gremlin>   it.as('a').out().sideEffect{leaf -> c+=(leaf.getProperty('count')?:0)}.loop('a'){true}.iterate()
gremlin>   it.setProperty('total',c)                                                                       
gremlin> }                                                                                                 
==>v[0]
==>v[1]
gremlin> g.v(0).total
==>5
gremlin> g.v(1).total                                                                                      
==>3

That query breaks down like this. First, this piece:

g.V().filter{it.outE().hasNext()}

gets any portion of the tree that is not a leaf node (i.e. should have at least one outgoing edge to not be a leaf). Second, we use sideEffect to process each root of a subtree:

it.as('a').out().sideEffect{leaf -> c+=(leaf.getProperty('count')?:0)}.loop('a'){true}.iterate()

storing the sum of the "count" property for each subtree in a variable called c. There's a bit of groovy goodness there with the elvis operator (?:) to check for vertices without a "count" property and return a zero in those cases. After you traverse the tree to calculate c you can just store the value of c in your root node of the subtree via:

it.setProperty('total',c)

edited Sep 29 '15 at 16:19

answered Sep 28 '15 at 19:40

stephen mallette

45,298
5
67
135

>>you can just store the value of c in your root node of the subtree<< How? – isobretatel Sep 29 '15 at 01:14
`v1.setProperty('total',c)` - do you need something more? – stephen mallette Sep 29 '15 at 10:34
Yes: calculate and cache those values for _each_ sub-tree, not for the whole tree. – isobretatel Sep 29 '15 at 15:24
I need to modify the query: property 'count' belongs to vertices that leaf vertices 'knows':v1 = g.addVertex() v2 = g.addVertex() v3 = g.addVertex() v4 = g.addVertex() v5 = g.addVertex([count:3]) v6 = g.addVertex([count:2]) v1.addEdge('child',v2) v1.addEdge('child',v3) v2.addEdge('child',v4) v3.addEdge('knows',v5) v4.addEdge('knows',v6) – isobretatel Sep 30 '15 at 21:34
Adding edge labels doesn't really change the outcome of the query I provided. It works independently of the edge labels as it is and I don't see a reason to constrain it to labels. Are you thinking that something no longer works as a result of the addition of the edge labels? – stephen mallette Oct 01 '15 at 10:48
The original model was oversimplified. The real model has two trees connected to each other with specific relationships. The query has to reference specific label of edges like "knows". Example: v1 = g.addVertex() v2 = g.addVertex() v3 = g.addVertex() v4 = g.addVertex() v5 = g.addVertex([count:3]) v6 = g.addVertex([count:2]) v1.addEdge('child',v2) v1.addEdge('child',v3) v2.addEdge('child',v4) v5.addEdge('knows',v3) v6.addEdge('knows',v4) Here v5 and v6 are part of another tree, not shown here. – isobretatel Oct 01 '15 at 13:34
Just filter on the edge labels then: `it.as('a').out('child','knows').sideEffect{leaf -> c+=(leaf.getProperty('count')?:0)}.loop('a'){true}.iterate()` This constrains your traversal to just those edge labels: "child' and 'knows'. – stephen mallette Oct 01 '15 at 13:53
What if the edge is in another direction: leaf.in('knows').getProperty('count') ? – isobretatel Oct 01 '15 at 13:56
I guess this is a way you could do it: `it.as('a').out('child').loop('a'){true}{true}.in('knows').sideEffect{leaf -> c+=leaf.getProperty('count')}.iterate()` – stephen mallette Oct 01 '15 at 14:36
Is it possible to visit each subtree only once? – isobretatel Oct 01 '15 at 15:26
i suppose it's possible - but i think the traversal would have to be re-worked considerably and wouldn't be as easy to follow as the approach i provided. I think you probably need to spend more time playing around with Gremlin to start seeing how you would approach it that way. – stephen mallette Oct 01 '15 at 15:40

How to calculate aggregates for sub-trees in Gremlin?

1 Answers1