3

I need to find all cliques of three size from my graph with Gremlin. I was able to do this in neo4j with cypher:

MATCH (a)-[:edge]-(b)-[:edge]-(c)-[:edge]-(a)
RETURN a,b,c

The example case is: A->B->C->A

One possible solution based in answer of @pkohan is:

g.V().as('x').sideEffect{x = it}.out().loop(1){it.loops < 4}{if(it.loops==4){if(it.object.id==x.id){true}else{false}}else{false}}.path.dedup().collect{"${it[0].id}->${it[1].id}->${it[2].id}"}

Someone has another idea?

crhistian
  • 33
  • 4
  • 5
    I don't think this should be closed for being too broad. It's a fairly specific question about cypher to gremlin conversion. The answer below shows that it's understandable by those familiar with cypher and gremlin graph query languages. – stephen mallette Apr 17 '15 at 12:36

2 Answers2

2

Here's a query that would be extremely inefficient on a large graph, but does what you would expect:

g.V().filter{it.out().loop(1){it.loops < 3}.id.filter{i -> it.id == i}.hasNext()}.map

This returns a pipe containing vertexes that can point back to themselves after walking three outgoing edges. You can change the number of edges to follow by changing the it.loops < 3 in the closure. You can do incoming edges by changing out() to in(), or you can follow either direction by using both(). You can also narrow down to edge types by putting them in the parentheses, for example:

g.V().filter{it.out("EDGE_TYPE").loop(1){it.loops < 3}.id.filter{i -> it.id == i}.hasNext()}.map

I'm unsure if neo4j has optimizations that make this query doable on a large database, but I'd imagine running this query on a titan graph with millions of edges and vertices would be dangerous.

pkohan
  • 146
  • 6
2

I assume that you're using Gremlin2.

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.V().as('x').both().loop('x') {it.loops < 3}.both().retain('x').path()
==>[v[1], v[3], v[4], v[1]]
==>[v[1], v[4], v[3], v[1]]
==>[v[3], v[4], v[1], v[3]]
==>[v[3], v[1], v[4], v[3]]
==>[v[4], v[1], v[3], v[4]]
==>[v[4], v[3], v[1], v[4]]

As you can see, it's 6 times the same clique, just different start- and end-vertices and different paths. Here's how you can deduplicate the result set:

gremlin> g.V().as('x').both().except('x').loop('x') {it.loops < 3}.both().retain('x').path().transform {it[0..2].sort()}.dedup()
==>[v[1], v[3], v[4]]

Since the path you're looking for has a fixed length, you can also get rid of the loop construct (which should lead to a faster execution):

gremlin> g.V().as('a').both().as('b').except('a').both().as('c').both().retain('a').path().transform {it[0..2].sort()}.dedup()
==>[v[1], v[3], v[4]]

You can do almost the same thing in Gremlin3, but you also have the option to use the new match-step to look for the pattern (which should be pretty straightforward if you know how do write the query in Cypher):

gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal(standard())
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().match("a",
gremlin>   __.as("a").both().as("b"),
gremlin>   __.as("b").both().as("c"),
gremlin>   __.as("c").both().as("d")).where("a", eq("d")).select("a", "b", "c").map {it.get().values().sort()}.dedup()
==>[v[1], v[3], v[4]]
Daniel Kuppitz
  • 10,846
  • 1
  • 25
  • 34