2

The following query returns a user map with an "injected" property called "questions", it works as expected when g.V().has() returns a single user, but not when returns multiple users:

  return g.V().has("user", "userId", 1)
      .union(
         __.valueMap().by(__.unfold()),
         __.project('questions').by(
            __.outE('response').valueMap().by(__.unfold()).fold()
         )
      )
      .unfold()
      .group()
      .by(__.select(column.keys))
      .by(__.select(column.values));

It works, but when I change the first line to return multiple users:

g.V().hasLabel("user").union(....

I finish the query calling .toList() so I was expecting to get a list of all the users in the same way it works with a single user but instead I still get a single user. How can I get my query to work for both, multiple users or a single user?

fermmm
  • 1,078
  • 1
  • 9
  • 17

1 Answers1

1

When using Gremlin, you have to think in terms of a stream. The stream contains traversers which travel through the steps you've written. In your case, with your initial test of:

g.V().has("user", "userId", 1)
      .union(
         __.valueMap().by(__.unfold()),
         __.project('questions').by(
            __.outE('response').valueMap().by(__.unfold()).fold()
         )
      )
      .unfold()
      .group()
      .by(__.select(column.keys))
      .by(__.select(column.values))

you have one traverser (i.e. V().has("user", "userId", 1) produces one user) that flows to the union() and is split so that it goes to both valueMap() and project() both producing Map instances. You now have two traversers which are unfolded to a stream and grouped together to one final Map traverser.

So with that in mind what changes when you do hasLabel("user")? Well, you now have more than one starting traverser which means you will produce two traversers for each of those users when you get to union(). They will each be flatted to stream by unfold() and then they will just overwrite one another (because they have the same keys) to produce one final Map.

You really want to execute your union() and follow on operations once per initial "user" vertex traverser. You can tell Gremlin to do that with map():

g.V().has("user", "userId", 1)
      .map(
        .union(
           __.valueMap().by(__.unfold()),
           __.project('questions').by(
              __.outE('response').valueMap().by(__.unfold()).fold()
         )
        )
        .unfold()
        .group()
          .by(__.select(column.keys))
          .by(__.select(column.values))
       )

Finally, you can simplify your final by() modulators as:

g.V().has("user", "userId", 1)
      .map(
        .union(
           __.valueMap().by(__.unfold()),
           __.project('questions').by(
              __.outE('response').valueMap().by(__.unfold()).fold()
         )
        )
        .unfold()
        .group()
          .by(keys)
          .by(values)
       )
stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • `.group().by(__.select(column.keys)).by(__.select(column.values))` returns what I need, a list of maps and `.group().by(column.keys).by(column.values)` returns a map with all the properties as arrays containing all the values of all the users, not what I needed, so it's not the same. Other that that your solution works perfectly, thanks again!!! Gremlin is confusing because it seems map() is needed sometimes and sometimes not, depending on the kind of step you are calling. – fermmm Apr 12 '20 at 11:29
  • oh - sorry - i didn't pick up that subtlety of what you wanted. i tend to feel that confusion about Gremlin is related to confusion about streams which is why i wrote this answer the way that i did. I hope that helps clarify what's happening in your mind. it's not as though the need for `map()` is completely arbitrary - there are reasons to use it and reasons not to and those reasons tie back to the stream oriented nature of the language. – stephen mallette Apr 12 '20 at 11:44