0

I'm following up with these 2 questions --

gremlin intersection operation

JanusGraph Gremlin graph traversal with `as` and `select` provides unexpected result

I'm viewing StackOverflow intensively(wanted to thank the community!) but unfortunately I didn't post/write a lot, so I don't even have enough reputation for posting a comment on the posts above...therefore I'm asking my questions here..

In 2nd post above, Hieu and I work together, and I want to provide a bit more background on the question.

As Stephen asked in the comment(for 2nd post), the reason that I want to chain V() in the middle is simply because I want to start the traversal from the beginning, i.e. each and every node of the whole graph just like what g.V() does, which appears at the beginning of most of the queries in gremlin documentation.

A bit more illustration: suppose I need 2 conditional filters on the results. Basically I want to write

g.V().(Condition-A).as('setA')
 .V().(Condition-B).as('setB')
 select('setA').
 where('setA',eq('setB'))

which borrows the last answer from Stephen's answer in the 1st post. Here Condition-A and Condition-B is just a chaining of different filter steps like has or hasLabel etc.

What should I write at the place of .V() in the middle? Or is there some other way to write the query so that Condition-B is completely independent of Condition-A?

Finally, I've read the section for chaining V() in the middle of a query at https://tinkerpop.apache.org/docs/3.5.0/reference/#graph-step. I still cannot fully understand the weird consequences for 2nd post, maybe I should read more about how traversers work?

Thanks Kelvin and Stephen again. Glad and excited to connect with you who wrote a book/wrote the source code for gremlin.

Hengzhi
  • 11
  • 4

1 Answers1

3

In the middle of a traversal, a V() is applied to every traverser that has been created by the prior steps. Consider this example using the air-routes data set:

g.V(1,2,3)

This will yield three results:

v[1]
v[2]
v[3]

and if we count all vertices in the graph:

gremlin> g.V().count()
==>3747 

we get 3,747 results. If we now do:

gremlin> g.V(1,2,3).V().count()
==>11241

we get 11,241 results (exactly 3 times 3747). This is because for each result from g.V(1,2,3) we counted every vertex in the graph.

EDITED to add:

If you need to aggregate some results and then explore the graph again using those results as a filter, one way is to introduce a fold step. This will collapse all of the traversers back into one again. This ensures that the second V step will not be repeated multiple times by any prior fan out.

gremlin> g.V(1,2,3).fold().as('a').V().where(within('a'))
==>v[1]
==>v[2]
==>v[3]

gremlin> g.V(1,2,3).fold().as('a').V().where(without('a')).limit(5)
==>v[0]
==>v[4]
==>v[5]
==>v[6]
==>v[7]    

EDITED again to add:

The key part I think people sometimes struggle with is how Gremlin traversals flow. You can think of a query as containing/spawning one or more parallel streams (it may not be executed that way but conceptually it helps me to think of it that way). So g.V('1') creates one stream (we often refer to them as traversers). However g.V('1').out() might create multiple traversers if there is more than one outgoing edge originating from V('1'). When a fold is encountered the traversers are all collapsed back down to one again.

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • Thanks Kelvin, this clarifies a lot of things to me. – Hieu Nguyen Sep 29 '21 at 17:42
  • Thank you Kevin. If I want to start the traversal after `as('setA')` (in my original post) from the beginning (i.e. each and every node of the whole graph just like what g.V() does, independent of `Condition-A`) to construct `Condition-B`, what should I write at the place of `.V()` in the middle of the query? Is there a way to do so? – Hengzhi Oct 01 '21 at 07:01
  • Just edited my original question as well. Looks like there's line diff like Github for my editing history, not sure if you guys can see it... fixed some important typo as well. – Hengzhi Oct 01 '21 at 07:10
  • In my answer above I used a `fold` to "prepare" for the second `V()`. That `fold` flattens the traversal back down to one again so the second `V()` is not repeated multiple times. – Kelvin Lawrence Oct 01 '21 at 20:35
  • The key part I think that you are struggling with is how Gremlin traversals flow. You can think of a query as containing one or more parallel streams (it may not be executed that way but conceptually it helps me to think of it that way). So `g.V('1')` creates one stream (we often refer to them as traversers). However `g.V('1').out()` might create multiple traversers if there is more than one outgoing edge. When a `fold` is encountered they are all collapsed back down to one again. I'll also add this to the answer. – Kelvin Lawrence Oct 01 '21 at 20:38