0

I am trying to write a query that retrieves all paths that are reachable from a specified vertex. In other words I am trying to retrieve the entire cluster/sub-graph that the vertex is connected to. A couple more constraints on the query are:

  1. inward edges should be traversed and included in the result (I am looking for all paths that are in any way connected to the root vertex.
  2. the search must stop at a specified depth of, say, 10 hops from the root vertex.
  3. Bonus constraint: I would prefer the result not to include paths which are complete sub-paths of other paths returned in the result.

I currently have the following two queries which appear to work expected on small, toy graphs I have tested them on. However, there seem to be some edge cases in our large, production graph that does not return all the paths/edges/vertices I would expect it to, but I cannot explain as to why this happens. The two queries also sometimes return some different vertices than each other.

I would prefer a fresh view on how to approach this query, rather than trying to adjust what I currently have, so please try to provide a solution before looking at my current solution below.

Query 1:

g.V(uid).repeat(bothE().bothV().simplePath()).until(loops().is_(10)).emit().dedup().path().by(valueMap(True))

Query 2:

g.V(uid).repeat(bothE().bothV().simplePath()).until(bothE().simplePath().count().is_(0).or_().loops().is_(10)).dedup().path().by(valueMap(True))

KOB
  • 4,084
  • 9
  • 44
  • 88
  • Are you wanting to get all the paths back (that will contain duplicate vertices) or the actual subgraph structure? – Kelvin Lawrence Sep 15 '20 at 16:48
  • I am open to either. What exactly do you mean by 'subgraph structure'? Are you planning on using the actual Gremlin `.subgraph()` function? – KOB Sep 15 '20 at 17:02
  • Gremlin has a `subgraph` step that will return the subgraph visited by a query yes - so as an alternative to `path` you could opt to return the subgraph if you want to rule out any part of the paths being duplicate. – Kelvin Lawrence Sep 15 '20 at 18:34
  • Could you provide a query that would do this ensuring that all edges/vertices reachable are returned? – KOB Sep 15 '20 at 18:55
  • I added an answer below showing an example of both `path` and `subgraph` – Kelvin Lawrence Sep 15 '20 at 19:13
  • @KelvinLawrence I have a performance question related to this, and I was wondering if you could take a look? https://stackoverflow.com/questions/71575494/performance-expectations-for-finding-all-paths-from-a-node-with-gremlin-in-aws Thanks! – wless1 Mar 22 '22 at 16:22
  • Will take a look ASAP – Kelvin Lawrence Mar 23 '22 at 02:13

1 Answers1

3

Using this simple binary tree as a test graph

g.addV('root').property('data',9).as('root').
  addV('node').property('data',5).as('b').
  addV('node').property('data',2).as('c').
  addV('node').property('data',11).as('d').
  addV('node').property('data',15).as('e').
  addV('node').property('data',10).as('f').
  addV('node').property('data',1).as('g').
  addV('node').property('data',8).as('h').
  addV('node').property('data',22).as('i').
  addV('node').property('data',16).as('j').
  addV('node').property('data',7).as('k').
  addV('node').property('data',51).as('l').  
  addV('node').property('data',13).as('m'). 
  addV('node').property('data',4).as('n'). 
  addE('left').from('root').to('b').
  addE('left').from('b').to('c').
  addE('right').from('root').to('d').
  addE('right').from('d').to('e').
  addE('right').from('e').to('i').
  addE('left').from('i').to('j').
  addE('left').from('d').to('f').
  addE('right').from('b').to('h').
  addE('left').from('h').to('k').
  addE('right').from('i').to('l').
  addE('left').from('e').to('m').
  addE('right').from('c').to('n').
  addE('left').from('c').to('g').iterate()

We could find all the paths using

gremlin>   g.V().hasLabel('root').
......1>   repeat(bothE().otherV().simplePath()).
......2>   until(__.not(bothE().simplePath())).
......3>   path().
......4>     by('data').
......5>     by(label) 

==>[9,right,11,left,10]
==>[9,left,5,left,2,left,1]
==>[9,left,5,left,2,right,4]
==>[9,left,5,right,8,left,7]
==>[9,right,11,right,15,left,13]
==>[9,right,11,right,15,right,22,left,16]
==>[9,right,11,right,15,right,22,right,51]   

Note that I used bothE().otherV() as you said in your case you may have some incoming edges as well as outgoing ones.

We could also use the subgraph step to return the whole sub graph containing both vertices and edges. This example finds the subtree that starts at the vertex for the value 5.

gremlin>   g.V().has('data',5).
......1>   repeat(bothE().subgraph('sg').otherV().simplePath()).
......2>   until(__.not(bothE().simplePath())).
......3>   cap('sg') 
==>tinkergraph[vertices:14 edges:13]  

Note that both of these approaches assumes that all paths end at leaf nodes. I left out the loops() test but you can add that in as needed.

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • Thanks. I am using `gremlin_python` so unfortunately I don't think the subgraph method will work, since I need the properties of the edges and vertices too. I'll look into your first method and see how it compares to my existing solutions. I have also seen `SubgraphStrategy` and will see if I can use this here, but I don't think you can use a query as complex as these for SubgraphStrategy, just simple condictions. – KOB Sep 16 '20 at 06:04
  • "Note that both of these approaches assumes that all paths end at leaf nodes.". I don't think this will work for me, since in my graph, the follwoing example is theoritically a subgraph which could exist. Your solution doesn't return anything https://gremlify.com/inch15pmxli/1 – KOB Sep 16 '20 at 06:10
  • Is there anyway to do something along the lines of `until(next_node_has_already_been_traversed)`? (But still include the edge to this node) – KOB Sep 16 '20 at 06:14
  • You can use something along the lines of `until(cyclicPath())` – Kelvin Lawrence Sep 16 '20 at 12:28
  • This gives results. You should be able to tinker with this to get the exact results you want (which edges etc) `g.V().hasLabel('customer_1').repeat(bothE(). otherV()). until(__.not(outE()).or().cyclicPath()).path()` – Kelvin Lawrence Sep 16 '20 at 12:38
  • Of course for your sample graph, all you would actually need is this but I believe from your description that the real graph will have cycles and you need to look at edges in both directions. `g.V().hasLabel('customer_1'). repeat(outE().inV()). until(__.not(outE())). path(` – Kelvin Lawrence Sep 16 '20 at 12:59
  • Thanks for your help. You solution using cyclicPath() seems to be exaclty what I need. One thing with it is that it returns a lot of paths. I want to make sure to return all edges and vertices, but not every path containing these needs to be traversed (actually the less paths, the better). I have slightly changed your original solution to this: `g.V().hasLabel('A').repeat(bothE().simplePath().otherV()).until(not(bothE().simplePath()).or().loops().is(5)).path().by(label)`. What do you think of this? Can you think of any cases this wouldn't act as I wish? – KOB Sep 16 '20 at 16:09
  • That looks good and it's sort of back to my original answer with the `loops` added. – Kelvin Lawrence Sep 16 '20 at 17:45
  • Yeah, the only change is moving the first simplePath to after the edge traversal step, so that edgers in cycles will be included, but it will stop traversing once it hits the vertex that creates the cycle (I think...) – KOB Sep 16 '20 at 18:17