2

I have two graph traversals with the following results:

g.V().has("id", 2).outE("knows").inV()
==>v[4216]
==>v[8312]
g.V().has("id", 5).outE("knows").inV()
==>v[4216]
==>v[8312]

Basically, both vertices with id 2 and 5 have edges to the same two other vertices v[4216] and v[8312].

Now if I chain those two above queries and tag them, and then select the first one, the result is not as expected.

g.V().has("id", 2).outE("knows").inV().dedup().as('a').V().has('id', 5).outE('knows').inV().dedup().as('b').select('a')
==>v[4216]
==>v[4216]

I expected that as I only select a, the result should be the same as executing the first graph traversal, which should return v[4216] and v[8312].

Do you know what could be an issue?

JanusGraph version is 0.5.3, and Tinkerpop is 3.4.6

Hieu Nguyen
  • 382
  • 2
  • 15

1 Answers1

4

This is actually working as expected. The second dedup is removing the traversers that carried the other vertices. Note also that your second V causes some additional fanning out of the query. Here is an example that I hope makes it clear.

Using this graph:

g.addV('a').as('a').
  addV('b').as('b').
  addV('c').as('c').
  addV('d').as('d').
  addE('knows').from('a').to('c').
  addE('knows').from('a').to('d').
  addE('knows').from('b').to('c').
  addE('knows').from('b').to('d')   

We can inspect the flow of the query:

With the second dedup

gremlin> g.V().hasLabel('a').outE("knows").inV().dedup().as('a').V().hasLabel('b').outE('knows').inV().dedup().as('b').select('
a').label()
==>c
==>c

Without the second dedup


gremlin> g.V().hasLabel('a').outE("knows").inV().dedup().as('a').V().hasLabel('b').outE('knows').inV().as('b').select('a').labe
l()
==>c
==>c
==>d
==>d

Using a path step we can see exactly what happened

gremlin> g.V().hasLabel('a').outE("knows").inV().dedup().as('a').V().hasLabel('b').outE('knows').inV().dedup().as('b').select('
a').path()
==>[v[0],e[4][0-knows->2],v[2],v[1],e[6][1-knows->2],v[2],v[2]]
==>[v[0],e[4][0-knows->2],v[2],v[1],e[7][1-knows->3],v[3],v[2]]

gremlin> g.V().hasLabel('a').outE("knows").inV().dedup().as('a').V().hasLabel('b').outE('knows').inV().as('b').select('a').path
()
==>[v[0],e[4][0-knows->2],v[2],v[1],e[6][1-knows->2],v[2],v[2]]
==>[v[0],e[4][0-knows->2],v[2],v[1],e[7][1-knows->3],v[3],v[2]]
==>[v[0],e[5][0-knows->3],v[3],v[1],e[6][1-knows->2],v[2],v[3]]
==>[v[0],e[5][0-knows->3],v[3],v[1],e[7][1-knows->3],v[3],v[3]]

Here are the same queries but with just the labels shown in the resuls.

gremlin> g.V().hasLabel('a').outE("knows").inV().dedup().as('a').V().hasLabel('b').outE('knows').inV().dedup().as('b').select('
a').path().by(label)
==>[a,knows,c,b,knows,c,c]
==>[a,knows,c,b,knows,d,c]   

gremlin> g.V().hasLabel('a').outE("knows").inV().dedup().as('a').V().hasLabel('b').outE('knows').inV().as('b').select('a').path
().by(label)
==>[a,knows,c,b,knows,c,c]
==>[a,knows,c,b,knows,d,c]
==>[a,knows,d,b,knows,c,d]
==>[a,knows,d,b,knows,d,d]  
Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • When `select('a')`, it should return `c` and `d`, as the graph traversal up to that point is supposed to return vertices `a` connects to. Why does it return `c` and `c`? @Kelvin – Hieu Nguyen Sep 20 '21 at 17:05
  • 1
    As I explained - the query creates four traversers. The second `dedup` will remove two of them and those happen to be the ones that were containing those other vertices. `select('a')` does not guarantee to "go back in time" - changes later in the query can affect what is stored as 'a' – Kelvin Lawrence Sep 20 '21 at 20:04
  • I added two more examples to the answer to see if that helps clarify. – Kelvin Lawrence Sep 20 '21 at 20:18
  • I think the addition of `by(label)` in those last two example helps make it clear what's happening. When you use `V()` in the middle of the traversal, you traverse `b->c` and b->d` twice, once for `a->c` and once for `a->d`. As a result, the `a->c` traverser already triggers traversal to "c" and "d" in the first two results so when you `select('a')` yo can only get "c" of `a->c`. The second `dedup()` removes any chance of `a->d` being evaluated. I'm not sure if that helps make anything more clear for you. – stephen mallette Sep 20 '21 at 20:36
  • I'd wonder if you really want to chain the `V()` steps the way that you are. Based on your expectations, it almost seems as though you would want to use a form of `union(V().out(), V().out()).dedup()` to get the answer you wanted. – stephen mallette Sep 20 '21 at 20:37
  • I don't quite get this part in your answer above: `As a result, the a->c traverser already triggers traversal to "c" and "d" in the first two results so when you select('a') yo can only get "c" of a->c` Why `select('a')` could not include a->d in this case (somehow it is tight to number two then)? Maybe I don't really get what `select()` step is doing... – Hieu Nguyen Sep 23 '21 at 00:13
  • 1
    I think that part of you're misunderstanding is in how using `V()` in the middle of a traversal works. When you do `V().V()` you are essentially telling Gremlin to find all the vertices and then for each one of those find all the vertices again. Each vertex from the first `V()` triggers a traversal of all of `V()` again. So in your case, the finding of the two paths from "a" (`a->c` and `a->d`) is triggering the finding of paths from "b" (`b->c` and `b->d`) twice. The second `dedup()` removes the paths triggered by `a->d` because you've already visited them when triggered by `a->c`. – stephen mallette Sep 23 '21 at 15:24
  • As the `dedup()` removes the path triggered by `a->d` your ending `select()` never has a chance to get that result. – stephen mallette Sep 23 '21 at 15:25