While optimizing a service that has AWS Neptune as a back-end I stumbled upon the recommendation to use inject()
, so I decided to batch a set of queries to avoid round-trips to Neptune:
g.inject(Array.from(attributes))
.unfold()
.as(SELECTOR_NAME)
.coalesce(
__.V()
.hasLabel(NodeType.Attribute)
.has(NodeType.propName.tenantId, this.tenantId)
.has(NodeType.propName.code, __.where(Predicate.eq(SELECTOR_NAME))),
__.addV(NodeType.Attribute)
.property(NodeType.propName.created, new Date())
.property(NodeType.propName.tenantId, this.tenantId)
.property(NodeType.propName.updated, new Date())
.property(NodeType.propName.code, __.identity())
.property(NodeType.propName.title, __.identity()) // TODO capitalize or get from source
)
.project(NodeType.propName.code, NodeType.propName.id)
.by(__.values(NodeType.propName.code))
.by(__.id())
.toList();
However, for only 27 items this takes between 90 and 100 seconds, which is slower than doing individual upserts. I understand that there could be many things wrong here, but I want to rule out an inefficiency in the query itself.
The stack used is Typescript + Node.js + gremlin + AWS Neptune, but also via a Sagemaker notebook the query takes about 90 seconds.
Below is a sample query that should work everywhere:
g.inject(["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", '24', '25']).
unfold().
as('a')
.coalesce(
__.V()
.hasLabel('attribute')
.has('tenantId', 'spm')
.has('code', __.where(eq('a'))),
__.addV('attribute').
property('created', 'new Date()').
property('tenantId', 'spm').
property('updated', 'new Date()').
property('code', __.identity()).
property('title', __.identity())
)
.project("code", "id").by(__.values("code")).by(__.id())
.toList()