JanusGraph Gremlin graph traversal with `as` and `select` provides unexpected result

Question

I have two graph traversals with the following results:

g.V().has("id", 2).outE("knows").inV()
==>v[4216]
==>v[8312]

g.V().has("id", 5).outE("knows").inV()
==>v[4216]
==>v[8312]

Basically, both vertices with id 2 and 5 have edges to the same two other vertices v[4216] and v[8312].

Now if I chain those two above queries and tag them, and then select the first one, the result is not as expected.

g.V().has("id", 2).outE("knows").inV().dedup().as('a').V().has('id', 5).outE('knows').inV().dedup().as('b').select('a')
==>v[4216]
==>v[4216]

I expected that as I only select a, the result should be the same as executing the first graph traversal, which should return v[4216] and v[8312].

Do you know what could be an issue?

JanusGraph version is 0.5.3, and Tinkerpop is 3.4.6

Kelvin Lawrence · Answer 1 · 2021-09-20T20:18:30.507

4

This is actually working as expected. The second dedup is removing the traversers that carried the other vertices. Note also that your second V causes some additional fanning out of the query. Here is an example that I hope makes it clear.

Using this graph:

g.addV('a').as('a').
  addV('b').as('b').
  addV('c').as('c').
  addV('d').as('d').
  addE('knows').from('a').to('c').
  addE('knows').from('a').to('d').
  addE('knows').from('b').to('c').
  addE('knows').from('b').to('d')

We can inspect the flow of the query:

With the second dedup

gremlin> g.V().hasLabel('a').outE("knows").inV().dedup().as('a').V().hasLabel('b').outE('knows').inV().dedup().as('b').select('
a').label()
==>c
==>c

Without the second dedup


gremlin> g.V().hasLabel('a').outE("knows").inV().dedup().as('a').V().hasLabel('b').outE('knows').inV().as('b').select('a').labe
l()
==>c
==>c
==>d
==>d

Using a path step we can see exactly what happened

gremlin> g.V().hasLabel('a').outE("knows").inV().dedup().as('a').V().hasLabel('b').outE('knows').inV().dedup().as('b').select('
a').path()
==>[v[0],e[4][0-knows->2],v[2],v[1],e[6][1-knows->2],v[2],v[2]]
==>[v[0],e[4][0-knows->2],v[2],v[1],e[7][1-knows->3],v[3],v[2]]

gremlin> g.V().hasLabel('a').outE("knows").inV().dedup().as('a').V().hasLabel('b').outE('knows').inV().as('b').select('a').path
()
==>[v[0],e[4][0-knows->2],v[2],v[1],e[6][1-knows->2],v[2],v[2]]
==>[v[0],e[4][0-knows->2],v[2],v[1],e[7][1-knows->3],v[3],v[2]]
==>[v[0],e[5][0-knows->3],v[3],v[1],e[6][1-knows->2],v[2],v[3]]
==>[v[0],e[5][0-knows->3],v[3],v[1],e[7][1-knows->3],v[3],v[3]]

Here are the same queries but with just the labels shown in the resuls.

gremlin> g.V().hasLabel('a').outE("knows").inV().dedup().as('a').V().hasLabel('b').outE('knows').inV().dedup().as('b').select('
a').path().by(label)
==>[a,knows,c,b,knows,c,c]
==>[a,knows,c,b,knows,d,c]   

gremlin> g.V().hasLabel('a').outE("knows").inV().dedup().as('a').V().hasLabel('b').outE('knows').inV().as('b').select('a').path
().by(label)
==>[a,knows,c,b,knows,c,c]
==>[a,knows,c,b,knows,d,c]
==>[a,knows,d,b,knows,c,d]
==>[a,knows,d,b,knows,d,d]

edited Sep 20 '21 at 20:18

answered Sep 20 '21 at 13:45

Kelvin Lawrence

14,674
2
16
38

When `select('a')`, it should return `c` and `d`, as the graph traversal up to that point is supposed to return vertices `a` connects to. Why does it return `c` and `c`? @Kelvin – Hieu Nguyen Sep 20 '21 at 17:05
1

As I explained - the query creates four traversers. The second `dedup` will remove two of them and those happen to be the ones that were containing those other vertices. `select('a')` does not guarantee to "go back in time" - changes later in the query can affect what is stored as 'a' – Kelvin Lawrence Sep 20 '21 at 20:04
I added two more examples to the answer to see if that helps clarify. – Kelvin Lawrence Sep 20 '21 at 20:18
I think the addition of `by(label)` in those last two example helps make it clear what's happening. When you use `V()` in the middle of the traversal, you traverse `b->c` and b->d` twice, once for `a->c` and once for `a->d`. As a result, the `a->c` traverser already triggers traversal to "c" and "d" in the first two results so when you `select('a')` yo can only get "c" of `a->c`. The second `dedup()` removes any chance of `a->d` being evaluated. I'm not sure if that helps make anything more clear for you. – stephen mallette Sep 20 '21 at 20:36
I'd wonder if you really want to chain the `V()` steps the way that you are. Based on your expectations, it almost seems as though you would want to use a form of `union(V().out(), V().out()).dedup()` to get the answer you wanted. – stephen mallette Sep 20 '21 at 20:37
I don't quite get this part in your answer above: `As a result, the a->c traverser already triggers traversal to "c" and "d" in the first two results so when you select('a') yo can only get "c" of a->c` Why `select('a')` could not include a->d in this case (somehow it is tight to number two then)? Maybe I don't really get what `select()` step is doing... – Hieu Nguyen Sep 23 '21 at 00:13
1

I think that part of you're misunderstanding is in how using `V()` in the middle of a traversal works. When you do `V().V()` you are essentially telling Gremlin to find all the vertices and then for each one of those find all the vertices again. Each vertex from the first `V()` triggers a traversal of all of `V()` again. So in your case, the finding of the two paths from "a" (`a->c` and `a->d`) is triggering the finding of paths from "b" (`b->c` and `b->d`) twice. The second `dedup()` removes the paths triggered by `a->d` because you've already visited them when triggered by `a->c`. – stephen mallette Sep 23 '21 at 15:24
As the `dedup()` removes the path triggered by `a->d` your ending `select()` never has a chance to get that result. – stephen mallette Sep 23 '21 at 15:25

JanusGraph Gremlin graph traversal with `as` and `select` provides unexpected result

1 Answers1

Linked