0

I'm new to using Gremlin (up until now I was accessing Neptune using Opencypher and given up due to how slow it was) and I'm getting really confused over some stuff here.

Basically what I'm trying to do is - Let us say we have some graph A-->B-->C. There are multiple such graphs in the database, so I'm looking for the specific A,B,C nodes that have the property 'idx' equals '1'. I want to add a node D{'idx' = '1'} and an edge so I will end up having A-->B-->C-->D

It is safe to assume A,B,C already exist and are connected together. Also, we wish to add D only if it doesn't already exist. So what I currently have is this:

g.V().
  hasLabel('A').has('idx', '1').
  out().hasLabel('B').has('idx', '1').
  out().hasLabel('C').has('idx', '1').as('c').
  V().hasLabel('D').has('idx', '1').fold().
  coalesce(
    unfold(), 
    addV('D').property('idx','1')).as('d').
  addE('TEST_EDGE').from('c').to('d')

now the problem is that well, this doesn't work and I don't understand Gremlin enough to understand why. This returns from Neptune as "An unexpected error has occurred in Neptune" with the code "InternalFailureException"

another thing to mention is that if the node D does exist, I don't get an error at all, and in fact th node is properly connected to the graph as it should.

furthermore, I've seen in a different post that using ".as('c')" shouldn't work since there is a 'fold' action afterwards which makes it unusable (for a reason I still don't understand, probably cause I'm not sure how this entire .as,.store,.aggregate work) And suggests using ".aggregate('c')" instead, but doing so will change the returned error to "addE(TEST_EDGE) could not find a Vertex for from() - encountered: BulkSet". This, adding to the fact that the code I wrote actually works and connects node D to the graph if it already exists, makes me even more confused.

So I'm lost

Any help or clarification or explanation or simplification would be much appreciated

Thank you! :)

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38

1 Answers1

0

A few comments before getting to the query:

  • If the intent is to have multiple subgraphs of (A->B->C), then you may not want to use this labeling scheme. Labels are meant to be of lower variation - think of labels as groups of vertices of the same "type".
  • A lookup of a vertex by an ID is the fastest way to find a vertex in a TinkerPop-based graph database. Just be aware of that as you build your access patterns. Instead of doing something like `hasLabel('x').has('idx','y'), if both of those items combined make a unique vertex, you may also want to think of creating a composite ID of something like 'x-y' for that vertex for faster access/lookup.

On the query...

The first part of the query looks good. I think you have a good understanding of the imperative nature of Gremlin just up until you get to the second V() in the query. That V() is going to tell Neptune to start evaluating against all vertices in the graph again. But we want to continue evaluating beyond the 'C' vertex.

Unless you need to return an output in either case of existence or non-existence, you could get away with just doing the following without a coalesce() step:

g.V().
  hasLabel('A').has('idx', '1').
  out().hasLabel('B').has('idx', '1').
  out().hasLabel('C').has('idx', '1').
  where(not(out().hasLabel('D').has('idx','1'))).
  addE('TEST_EDGE).to(
       addV('D').property('idx','1'))
  )

The where clause allows us to do the check for the non-existence of a downstream edge and vertex without losing our place in the traversal. It will only continue the traversal if the condition specified is not() found in this case. If it is not found, the traversal continues with where we left off (the 'C' vertex). So we can feed that 'C' vertex directly into an addE() step to create our new edge and new 'D' vertex.

Taylor Riggan
  • 1,963
  • 6
  • 12
  • Wow thank you! This teaches me a lot and I will implement some of your good ideas. I actually forgot to mention in my original post (will edit) but I will need to create several new nodes in this method, in my example I would want to add multiple D nodes to that specific C node, while checking for each D node if it exists. that is why I wanted to save C .as('c'). Will I be able to use the same concept as you gave in your example? – Oded Answer Aug 16 '22 at 17:03