Most optimal gremlin query for getting all related vertices

Question

I need to get all the vertices related by any number of relationships to the starting vertex. I have a working query, but it starts to slow significantly after a few hundred edges and the complexity of the graph. Is there any more efficient way of getting the related vertices?

g.V(id)
 .emit()
 .repeat(both())
 .until(cyclicPath())
 .unfold()
 .dedup()
 .toList()

An example in the breakdown in performance was noticed for a subgraph with 202 vertices, 259 edges. After running a profile, it seems to have issued 1,444,439 traversals, taking around 80s.

Additional info: This comes from running using AWS Neptune 1.0.1.0.200258.0

score 3 · Accepted Answer · answered Nov 15 '18 at 15:02

3

Looks like you only want to find all vertices that are somehow connected to the initial vertex. Try this query (it doesn't enable path tracking and thus should be much faster):

g.V(id).emit().repeat(both().dedup())

answered Nov 15 '18 at 15:02

Daniel Kuppitz

10,846
1
25
34

This does indeed seem faster. The issue I am running into is that the initial vertex is included twice, so I have to add another `dedup()` at the end. – Dave Zabriskie Nov 15 '18 at 15:22
1

Keep a single `dedup()`, but move `emit()` behind `repeat()`. This will only become an issue, if the initial vertex has no edges at all, in all other cases it does what you want. – Daniel Kuppitz Nov 15 '18 at 15:28
Yes, we want the query to return the single vertex, if exists, so having the `emit()` before the `repeat()` is desired. – Dave Zabriskie Nov 15 '18 at 15:36

Most optimal gremlin query for getting all related vertices

1 Answers1