0

I have a running AWS Neptune graphDB which is being used in a production environment. I have since identified new vertices that I would like to add that will connect to specific existing vertices in the DB.

I have added the original set by splitting it up with the 'csv-to-neptune-bulk-format' script in https://github.com/awslabs/amazon-neptune-tools/tree/master/csv-to-neptune-bulk-format .

My question is, how can I bulk load my additional set in the most efficient way? I have two ideas on how to appraoch this, but I'm hoping that someone knows a simpler way.

Approach 1 will be to use the above 'csv-to-neptune-bulk-format' script to split up the new additional set and then bulk load that. I will then have duplicate vertices of where the new set overlaps with the original as the above script will assign new vertex id's for the vertices where the new set will connect to the original set. I have a function to then merge these duplicate vertices. This approach can be quite resource intensive though.

Approach 2 will be to split up the additional set with the above script and then replace the connecting vertex's id's in the generated csv for the edges that will connect the original set with the additional set. So basically the edge csv will change from [~id,~label,~from,~to] to [~id,~label, complimenting vertex id's generated from the first bulkupload,~to].

I'm hoping that I've missed some documentation or logic somewhere that will allow me to use existing vertex id's to simply bulk load the new processed vertices csv and the edge csv that will connect the new vertices with original vertices. Any help or advice will be greatly appreciated.

d95l
  • 13
  • 2

1 Answers1

0

The bulk loader can be used for more than just a first time load into an empty graph. You can use it to add new nodes and edges, and to update existing nodes and edges where you need to add new properties or replace the value of an existing (single cardinality) property.

I have not used the csv-to-neptune-bulk-format tool, I typically generate the Neptune CSV format for nodes and edges directly.

Can you say a bit more about the format the data you want to ingest is currently in and why you need to ETL it using that tool? If you can add a bit more info I will update this answer accordingly.

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • Thank you for your response @KelvinLawrence. The data I want to ingest is a local CSV file. My overall workflow up until now involved me saving data from the DB, converting it to the verex and edge csv files with the above mentioned tool, uploading to S3 and then into Neptune. I've decided on this intially because I wanted to make sure everything is correct and in the right format as I understood it before uploading it to a S3 bucket. – d95l Jul 25 '22 at 08:17
  • So I've narrowed my issue down. When I run the script to create the node and edge csv files it does not keep track of the id value it assigned to that vertex previously. So in simple terms, if I run the above script twice in a row then the first vertex id will be different to the one created in the second run. – d95l Jul 25 '22 at 10:42