1

For some context: I am currently using azure cosmos db with gremlin api, because of the storage-scaling architecture, it's much less expensive to perform a '.out()' operation than a '.in()' operation, hence I always create double directed edges, so I choose which one to use with '.out()' operation depending on which direction I want to query.

We use the graph to associate events with users. Whenever a user 'U' raises an event 'E', we create two edges:

g.V('U').addE('raisedEvent').to(g.V('E'))
g.V('E').addE('raisedByUser').to(g.V('U'))

Very rarely, one of these queries fails for one reason or another and we end up with only a single edge between the two vertices. I've been trying to find a way to query for all vertices that have only a uni-directional relationship given a set of 'paired' edge-labels, in order to find these errors and re-create the missing edge.

Basically I need a query where...

  • given a pair of edge labels E1 (for outgoing, V1-->V2), E2 (for incoming V1<--V2)
  • finds finds all vertices V1 where for every outgoing edge E1 to another vertex V2, V2 doesn't have an edge E2 going back to V1; and vice-versa

Example:

// given a graph
g.addV('user').property('id','user_1')
g.addV('user').property('id','user_2')
g.addV('user').property('id','user_3')
g.addV('user').property('id','user_4')
g.addV('event').property('id','event_1')
g.addV('event').property('id','event_2')
g.addV('event').property('id','event_3')
g.addV('event').property('id','event_4')

g.V('user_1').addE('raisedEvent').to(g.V('event_1')).V('event_1').addE('raisedByUser').to(g.V('user1'))
g.V('user_2').addE('raisedEvent').to(g.V('event_2')).V('event_1').addE('raisedByUser').to(g.V('user1'))
g.V('user_2').addE('raisedEvent').to(g.V('event_3'))
g.V('event_4').addE('raisedByUser').to(g.V('user_3'))

// i.e.
//                (user_1) <--> (event_1)
// (event_2) <--> (user_2) ---> (event_3)
// (event_4) ---> (user_3)
//                (user_4)

// Then, the query should match with user_2 and user_3... 
// ...as they contain uni-directional links to events

Edit: Note - The cosmosdb implementation of the 'is()' operation doesn't support giving traversal results as an input I.e. queries such as


where(_.outE('raisedEvent').count().is(__.out('raisedEvent').outE('raisedByUser').count()))

Are currently unsupported in cosmosdb.

If possible, it would also be great to get a list of which pairs of vertices have a bad link (e.g. in this case [(user_2, event_3), (user_3, event_4)]), but just knowing which vertices have a bad link will be very useful already.

David Makogon
  • 69,407
  • 21
  • 141
  • 189
Nines
  • 21
  • 4
  • I can come back and add a proper answer later, but to get you started, there is a pattern in Gremlin that can be used to find a vertex that has an adjacent vertex with no edge coming back the other way. It looks like this `g.V().as('a').out().where(__.not(out().as('a')))` I don't know if Cosmos DB supports that pattern. – Kelvin Lawrence Mar 04 '22 at 14:47
  • One additional comment? Have you considered creating both edges using the same query/transaction? That way they either both succeed or both fail. – Kelvin Lawrence Mar 04 '22 at 18:40
  • 1
    @KelvinLawrence A pattern like that was exactly what I was looking for, thanks a lot! I was already in the process of changing the insertion queries to add the edges in one query instead of multiple, but the 'damage' had already been done so to speak, so I needed a way to diagnose the single edges that were already made. – Nines Mar 06 '22 at 04:38

1 Answers1

1

Thanks to Kelvin Lawrence, I ended up using this pattern to get a list of vertex id pairs that are only uni-directionally connected from a to b:

g.V().haslabel("user").as('a').out('raisedEvent').where(__.not(out('raisedByUser').as('a'))).as('b').select('a','b').by('id')
Nines
  • 21
  • 4