0

may I get some help from improving our query?

So the idea is that we have many connected sub-graphs, and each vertex has an unique id. Now we know some of ids, and we want to get all of the connected vertices in one query.

For example, we have a -> b <- c -> d, and e -> f <- g. Now the input is {b, e}, and the result we want is {a, b, c, d, e, f, g}. Because, {a, c, d} is connected to b and {f, g} is connected to e.

Current I’m using a very dumb query like

g.V("b").emit().repeat(both().simplePath()).aggregate("connected")
 .V("e").emit().repeat(both().simplePath()).aggregate("connected")
 .select("connected").unfold().dedup()

which might work sometimes, but when (if) all the vertices are already connected to each other, I will run into MemoryLimitExceededException

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58

1 Answers1

0

The Gremlin Recipes include a connected components recipe but may not be appropriate for very large graphs. Please take a look at [1].

[1] https://tinkerpop.apache.org/docs/current/recipes/#connected-components

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38