Apply customized transformation logic when replicate data from Neptune to AWS ElasticsearchService

Question

Amazon Neptune now supports Full-Text Search Using Amazon Elasticsearch Service. It automatically replicates data from Neptune to Elasticsearch. My question is: Does Neptune support customized transformation logic during the replication? For example, I have a vertex in Neptune like Brand(id=123, name="Calvin Klein"), and I want to apply customized transformation logic to the vertex so that this vertex will be transformed to a document {id:123, name:"Calvin Klein", normalizedName:"calvinklein"} in Elasticsearch.

Easily done by changing the painless script in default neptune stream poller lambda function. — Pulathisi Bandara, Oct 17 '20 at 04:00

score 1 · Accepted Answer · answered May 19 '20 at 02:41

Is custom logic supported?

Yes, Neptune ES replication process does allow users to have there own transformation logic for storing data in Elastic Search. But one has to be careful since changing ES document may break text search support via ES from NeptuneGremlin since it would need all the original fields in the document. One can always add new fields to the elastic search document if needed.

Please refer Neptune data model for Elastic search

Can custom ES field be used in Gremlin FTS query?

Custom ES field can be used in Gremlin FTS query if it is stored in the same way a gremlin property is stored in ES document i.e. as a nested field inside "predicates" field.

Ex: if {normalizedName:"calvinklein"} is to be added as custom field then ES document should store it as: { ....... // Original fields "predicates": { ....... // Original Properties "normalizedName":[ { "value": "calvinklein" } ] }

This field can be searched like any other Gremlin property only using Neptune FTS query. ( Field is not present in Neptune but in ES)

How can i add custom transformation logic?

The sample you are referring uses Python based AWS Lambda handler to replicate data from Neptune to Elastic Search using NeptuneStreams. It gets the change logs from the NeptuneStreams and converts them into ES upsert request.

One can provide their own implementation of Lambda poller handler for processing Stream records. Please follow below blog post for details:

https://aws.amazon.com/blogs/database/capture-graph-changes-using-neptune-streams/

As per the above blog post please follow below steps :

Create customized streams handler class which inherits from the polling framework’s AbstractHandler and implement a handle_records() method.
Once you’ve finished authoring your stream handler, create a deployment package in the form of a ZIP archive and upload it to S3.
To use new zip archive supply the LambdaS3Bucket and LambdaS3Key for your handler package in CFN template.
Also, Supply the name of your StreamRecordsHandler i.e. the customized handler you have created.

Apply customized transformation logic when replicate data from Neptune to AWS ElasticsearchService

1 Answers1