4

I have tree data structure in graph as shown in below diagram. Each color represents node with different labels with relation like employee -> app -> project -> pv -> scan).

Question #1:

I want to find all leaf nodes (ones in green) of top node 0.

I tried below code with loop which returns all nodes with label employee. Not just leaf nodes.

g.V().has('person', 'id', '0').repeat(__.in('reportsTo')).emit().values('id')

Sample graph can be found in gremlinbin.

How do I find all green leaf nodes?

Update #1:

As mentioned in comments, I tried tree pattern. But it doesn't let me call getLeafObjects() on tree. Not sure what's missing. Also, again I am able to create tree of employee nodes only. How to traverse to scan nodes?

> tree = g.V().has('person', 'id', '0').repeat(__.in('reportsTo')).emit().tree()
>  tree.getLeafObjects()
No signature of method: org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal.getLeafObjects() is applicable for argument types: () values: []

Question #2:

How do I retrieve a child vertex amongst children under each parent based on max(id)? So in my sample graph, each black vertex can have one or more green child vertex. I want to find the green vertices with max(property) under each black vertices.

enter image description here

indusBull
  • 1,834
  • 5
  • 27
  • 39
  • Those are all scans. Is that your definition of "green"? That list of scans was part of your input in the first place. You might simply query for the label of the vertex. – Wolfgang Fahl Jan 11 '18 at 15:58
  • That would work if I want to find all scan nodes under root. But it would not work If I want to find all scan nodes under particular employee node not just root. – indusBull Jan 11 '18 at 16:02
  • https://github.com/tinkerpop/gremlin/wiki/Tree-Pattern might help you there is a helper method for getting leafs. – Wolfgang Fahl Jan 11 '18 at 16:06

1 Answers1

6

I think you just need to modify your emit(). Without an argument, that's saying to emit everything from the repeat(). If you only want leaf vertices, then include something like: not(outE()) which basically says only emit if there are no outgoing edges on the vertex which would mean it's a leaf vertex. You might need to make your specific emit() predicate a bit smarter as it looks like your schema is such that different types of vertices have different rules for what might make it a leaf.

Given the sample graph you had in GremlinBin, I did this to get all the green vertices at the bottom of your picture above:

g.V().has('employee','id',1).
  repeat(__.in('reportsTo')).emit().
  repeat(out('has')).emit(__.not(outE('has')))

In answer to your second question you could extend the above to:

g.V().has('employee','id',1).
  repeat(__.in('reportsTo')).emit().
  repeat(out('has')).emit(__.not(outE('has'))).
  group().
    by(__.in('has')).
  select(values).
  unfold().
  order(local).
    by('id',decr).
  local(unfold().limit(1))

Basically group the leaf vertices back on their parent vertex, then pop off the values which is the list of leaves per parent. Flatten those with unfold() and order them each by the property you care about (in this case "id") and then choose the first item in that ordered list.

stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • something may still not be clear to me about what you're asking, but i updated my answer - is that what you are looking for? – stephen mallette Jan 11 '18 at 16:52
  • Your sample query led me in right direction. I am able to come up with query to return desired results. Thanks! I have added a follow up question in main body. It will be great if you can provide some suggestions. – indusBull Jan 11 '18 at 21:46
  • I deleted my first comment. I had accidentally pasted query with actual id :-( – indusBull Jan 11 '18 at 21:48
  • Thanks that works perfect although it takes 90s to traverse 350K node graph. I will have to look into optimization. – indusBull Jan 12 '18 at 18:59
  • I had the almost exact same problem as "healthcare provider network" owns "bunch of hospitals" then from there "satellite clinics" situations and want to traverse down from any point in that tree like "list all facilities" and "list clinics managed by a hospital". This solves the same issue. +1 both on the great question (graphcs helped a lot) and the answer. – Manabu Tokunaga Jul 15 '21 at 00:55