3

Given two Gremlin queries q1 and q2 and their results ri = qi.toSet(), I want to find all nodes in r1 that have a connection to a node in r2 - ignoring edge labels and direction.

My current approach included the calculation of shortest paths between the two result sets:

q1.shortestPath().with_(ShortestPath.target, q2).toList()

However, I found the shortest path calculation in Tinkerpop is unsuitable for this purpose because the result will be empty if there are nodes in r1 without any connection to any node in r2.

Instead, I thought about connected components, but the connectedComponents() step will yield all connected components found and I would have to filter them to find the connected component that meets the above requirements.

Do you have suggestions on how I could tackle this problem in gremlin-python?

Green绿色
  • 1,620
  • 1
  • 16
  • 43

1 Answers1

2

Here is one way of doing what I think you need in Gremlin Python. This may or may not be efficient depending on the size and shape of your graph. In my test graph only vertices 1,2 and 3 have a route to either 12 or 13. This example does not show you how you got there, just that at least one path exists (if any exist).

>>> ids = g.V('1','2','3','99999').id().toList()
>>> ids
['1', '2', '3', '99999']
>>> ids2 = g.V('12','13').id().toList()
>>> ids2
['12', '13']

>>> g.V(ids).filter(__.repeat(__.out().simplePath()).until(__.hasId(within(ids2))).limit(1)).toList()
[v[1], v[2], v[3]]

You can also use dedup() instead of simplePath() and limit() if you only care that any route exists.

g.V(ids).filter(__.repeat(__.out().dedup()).until(__.hasId(within(ids2)))).toList()
Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • 1
    Thanks for your suggestion. Since I'm interested in both directions, I replaced `out()` with `both()`. Unfortunately, I get a timeout if the initial set (i.e., g.V(ids)) is larger than 3. My graph consists of about 7,000 nodes and 19,000 edges. – Green绿色 Dec 28 '19 at 01:10
  • 1
    Could you say a bit more about your graph shape/schema?. I tested the query using a graph which has 4K vertices and 50K edges and had no issues. I tried varying amounts of source and target vertices. I also edited my answer to show an alternative way of writing the query. – Kelvin Lawrence Dec 28 '19 at 17:55
  • 1
    The dedup() form will be more performant in cases where you just want to know that at least one route exists between each source vertex and one of target vertices. – Kelvin Lawrence Dec 28 '19 at 18:02
  • 1
    The graph consisted of several fully connected components, I suspected, there might be too many paths in this graph. – Green绿色 Feb 01 '20 at 10:45