0

We are using the Gremlin JavaScript language variant and Amazon Neptune in our project and we have multiple use cases for the creation of vertices and edges in batch.

A simple example would be an array of 200 - 1000 users. I need to perform a batch query that checks whether the user exists or not. If the user exists then add the vertices with the properties else ignore that user. All these conditions need to be done in batch.

Note: Usage of Gremlin scripts needs to be avoided. So traversal is what I am looking for.

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
codegutsy
  • 37
  • 12
  • Can you share the Gremlin that you have tried so far? I assume you are wanting to combine `inject` and `coalesce` to do this? – Kelvin Lawrence Oct 28 '21 at 02:04
  • I am new to Gremlin so I don't know the purpose of inject and coalesce. I tried simple batching. For example: g.addV("User").property(id, "user1").property("name", "user1").as("user1"). addV("User").property(id, "user2").property("name", "user2").as("user2").next(). This can add up to 200 to 1000 and This query is adding vertices without checking the existence of data in the graph. – codegutsy Oct 28 '21 at 03:57
  • I'll try to write up an example as an answer soon. Essentially you can `inject` a map into a traversal and use it to populate vertices and/or edges. The `coalesce` step helps with upsert patterns such as "create if not exists" – Kelvin Lawrence Oct 28 '21 at 15:21

1 Answers1

3

It is possible to seed a query with a list of maps containing the data to be inserted. You can further extend the pattern to use a coalesce step to do conditional inserts. Using the air-routes data set here is a simple example that creates a new XYZ airport and figures out the other airports already exist. Note that the mid-traversal V step makes this a somewhat expensive query as for each map in the list all vertices have to be "searched".

g.inject([['code':'AUS'],['code':'XYZ'],['code':'SFO']]).
  unfold().as('data').
  coalesce(V().hasLabel('airport').
    where(eq('data')).
      by('code').
      by(select('code')),
    addV('airport').
      property('code',select('code')))

There are additional discussions of using this pattern to avoid long chains of addV and addE steps in a query.

https://tinkerpop.apache.org/docs/current/recipes/#long-traversals

When the query is run you can see that a new ID is created for the XYZ airport and the existing IDs are found for the others.

gremlin> g.inject([['code':'AUS'],['code':'XYZ'],['code':'SFO']]).
......1>   unfold().as('data').
......2>   coalesce(V().hasLabel('airport').
......3>     where(eq('data')).
......4>       by('code').
......5>       by(select('code')),
......6>     addV('airport').
......7>       property('code',select('code')))
==>v[3]
==>v[61286]
==>v[23]    
Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • Thank you. It worked as expected in the Gremlin console. Now I am trying to implement it using the Gremlin JavaScript language variant – codegutsy Nov 01 '21 at 12:49