0

I'm trying to add a Subscription vertex ([Org-Sends->Subscription<-Receives-User]) for each [Org-Manages->User] where one is not already present. I wrote this traversal, which appears to select the values I'm interested in:

g.V().hasLabel('Org')
  .match(
    __.as('oV').out('Manages').as('uV'),
    not(__.as('oV').out('Sends').in('Receives').as('uV')
  )

I get a set of tuples oV,uV with the relevant vertex pairs. So far, so good.

I then appended this:

.sideEffect(
  addV('Subscription').as('newV')
    .select('oV').addE('Sends').to(select('newV'))
    .select('uV').addE('Receives').to(select('newV'))
)

I expected, for each of the pairs oV,uV, to get the relevant structure set up. Instead, all of the Subscription vertices are created, and all of the Receives edges are created, but only one Sends edge per Org is created.

As far as I can tell, the "input" to sideEffect is a set of "rows" with lots of duplicated Org vertices. I also tried inserting select('oV', 'uV') before the sideEffect with no change in behavior. Why is the addE, but only the addE, short-circuiting after seeing a particular oV?


I am executing this against Amazon Neptune, and I ran explain in the console. This was the explanation I got, which appears to match my written traversal:

Final Traversal[
    NeptuneGraphQueryStep(Vertex),
    NeptuneTraverserConverterStep,
    MatchStep(
        AND,[
            [MatchStartStep(oV), VertexStep(OUT,[Manages],vertex), MatchEndStep(uV)],
            [MatchStartStep(oV), WhereTraversalStep([NotStep([WhereStartStep, VertexStep(OUT,[Sends],vertex), VertexStep(IN,[Receives],vertex), WhereEndStep(uV)])]), MatchEndStep, MatchEndStep]
        ]
    ),
    TraversalSideEffectStep([
        AddVertexStep({$migration=[asdf], label=[Subscription]})@[newV],
        SelectOneStep(last,oV),
        NoOpBarrierStep(2500),
        AddEdgeStep({~to=[[SelectOneStep(last,newV)]], label=[Sends]}),
        SelectOneStep(last,uV),
        NoOpBarrierStep(2500),
        AddEdgeStep({~to=[[SelectOneStep(last,newV)]], label=[Receives]})
    ])
]
chrylis -cautiouslyoptimistic-
  • 75,269
  • 21
  • 115
  • 152
  • I did quite a few experiments (shown below) but was unable to recreate what I believe you are describing. Perhaps you could take a look at my answer and see if there is some nuance I am missing that would make my experiment different. – Kelvin Lawrence Jul 25 '21 at 14:28
  • Thanks, @KelvinLawrence, and I'll try to take a look tomorrow. The production query is more complex, but only by attaching properties, and I confirmed with the `match`-part that the input to `sideEffect` does in fact return all of the expected pairs unrolled. – chrylis -cautiouslyoptimistic- Jul 25 '21 at 17:04

1 Answers1

1

Using the air-routes data set I attempted to simulate your query using both Amazon Neptune and TinkerGraph. In each case everything seems to be working as intended. I got all of the edges I would expect to get created. Here is what I used for testing.

The air routes data set includes routes without a matching return. So I used some of those to simulate some where we wanted to create some edges.

gremlin> g.V().has('airport','region','GB-ENG').
......1>   match(__.as('a').out('route').as('b'),
......2>         __.not(__.as('b').out('route').as('a'))).
......3>   select('a','b').by('code')
==>[a:MAN,b:ANU]
==>[a:LTN,b:ORY]  

Using that foundation I added your sideEffect step. Note that there is a an unwanted comma between the addE steps in your query which I removed below.

gremlin> g.V().has('airport','region','GB-ENG').
......1>   match(__.as('a').out('route').as('b'),
......2>         __.not(__.as('b').out('route').as('a'))).
......3>   sideEffect(addV('Subscription').as('newV').
......4>              select('a').addE('Sends').to(select('newV')).
......5>              select('b').addE('Receives').to(select('newV')))

==>[a:v[84],b:v[219]]
==>[a:v[206],b:v[107]]

We can then verify that all of the expected structure was created.

gremlin> g.V().hasLabel('Subscription')
==>v[61367]
==>v[61370]


gremlin> g.V().hasLabel('Subscription').inE().outV().path()
==>[v[61367],e[61369][219-Receives->61367],v[219]]
==>[v[61367],e[61368][84-Sends->61367],v[84]]
==>[v[61370],e[61372][107-Receives->61370],v[107]]
==>[v[61370],e[61371][206-Sends->61370],v[206]]  

Note that the query can be rewritten without match using where steps which will likely result in a more efficient query execution plan.

gremlin> g.V().has('airport','region','GB-ENG').as('a').
......1>       out('route').as('b').
......2>       where(__.not(out('route').as('a'))).
......3>       select('a','b').
......4>       by('code')
==>[a:MAN,b:ANU]
==>[a:LTN,b:ORY] 
Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • Thanks for catching the spurious comma; that was a posting error here. I note that in your example, the traversal is both to-and-from the same kind of vertex (airport), and that it doesn't have the intermediate vertex that's being traversed through. You're certainly right about the option of collapsing to a `where`, but this is for a one-time migration, and the logic is more clearly readable to reviewers with the `match` syntax. – chrylis -cautiouslyoptimistic- Jul 25 '21 at 17:08