Inject() list of strings to be upsert as vertices takes a very long time in Gremlin + Neptune

Question

While optimizing a service that has AWS Neptune as a back-end I stumbled upon the recommendation to use inject(), so I decided to batch a set of queries to avoid round-trips to Neptune:

g.inject(Array.from(attributes))
        .unfold()
        .as(SELECTOR_NAME)
        .coalesce(
            __.V()
                .hasLabel(NodeType.Attribute)
                .has(NodeType.propName.tenantId, this.tenantId)
                .has(NodeType.propName.code, __.where(Predicate.eq(SELECTOR_NAME))),
            __.addV(NodeType.Attribute)
                .property(NodeType.propName.created, new Date())
                .property(NodeType.propName.tenantId, this.tenantId)
                .property(NodeType.propName.updated, new Date())
                .property(NodeType.propName.code, __.identity())
                .property(NodeType.propName.title, __.identity()) // TODO capitalize or get from source
        )
        .project(NodeType.propName.code, NodeType.propName.id)
            .by(__.values(NodeType.propName.code))
            .by(__.id())
        .toList();

However, for only 27 items this takes between 90 and 100 seconds, which is slower than doing individual upserts. I understand that there could be many things wrong here, but I want to rule out an inefficiency in the query itself.

The stack used is Typescript + Node.js + gremlin + AWS Neptune, but also via a Sagemaker notebook the query takes about 90 seconds.

Below is a sample query that should work everywhere:

    g.inject(["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", '24', '25']).
    unfold().
    as('a')
.coalesce(
  __.V()
  .hasLabel('attribute')
  .has('tenantId', 'spm')
  .has('code', __.where(eq('a'))),
  __.addV('attribute').
  property('created', 'new Date()').
  property('tenantId', 'spm').
  property('updated', 'new Date()').
  property('code', __.identity()).
  property('title', __.identity())
)
.project("code", "id").by(__.values("code")).by(__.id())
.toList()

Moving the Vertex filters on label and tenantId before the coalesce had no effect. — Jeroen Vlek, Sep 21 '21 at 14:27

Jeroen Vlek · Accepted Answer · 2021-09-21T20:33:08.107

Response time of this was 72 ms It was a lot of trial and error, but the main solution was to use withSideEffect(), instead of inject():

g.withSideEffect("aList", ["a", "b", "c", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "x", 'y', 'z'])
.V()
.hasLabel('test_attribute')
.has('tenantId', 'spm')
.fold()
.as('v')
.select('aList')
.unfold()
.as('a')
.map(
__.coalesce(
      __.select('v').unfold().has("code", where(eq('a'))),
      __.addV('test_attribute').
      property('created', 'new Date()').
      property('tenantId', 'spm').
      property('updated', 'new Date()').
      property('code', __.identity()).
      property('title', __.identity())
)
)
.project("code", "id").by(__.values("code")).by(__.id())
.toList()

Inject() list of strings to be upsert as vertices takes a very long time in Gremlin + Neptune

1 Answers1