-1

Currently I am using Neptune's bulk loader to load data from CSV to Neptune database using Python. I was getting only modified records. So I was only inserting the data.

Now the file contains full/historical records and the requirement is to delete all existing edges for a given record in the CSV file and reload all the edges from CSV file. I was thinking of deleting all edges based on certain property from the DB and then call the Neptune's bulk loader method.

I have multiple (different labels) edges which are loaded from CSV files (kept in a s3 bucket). So I might have to delete multiple edges before I can start loading CSV files.

However if I delete the edges and then CSV bulk load fails, I will have no way to get back the deleted data.

How do I make sure my delete and bulk insert happens within a transaction block? Do we have such option in Neptune? I am using Python Gremlin.

user2026504
  • 69
  • 1
  • 9

1 Answers1

0

TLDR; No, You can not do this in Neptune.

Though you can achieve this through series of steps.

  1. Take snapshot of the database.
  2. Initiate edge deletion.
  3. Initiate bulk load.

If the bulk load finishes successfully then you are good.

However, if the bulk load fails then, you can use snapshot taken in step 1 to spawn a new cluster which has same data as before.

Obviously, this solution assumes you are fine with downtime introduced while spawning a new cluster and your application getting updates about this new cluster endpoint.

  • I understand you did not read the question, It's not entire DB only certain edges, so your approach will not work. – user2026504 Dec 01 '21 at 08:07
  • Step 2: initiate the edge deletion is dependent on you. You can partially delete the edges or delete all edges. how does it matter ? There is no such transaction block in neptune using which you can club bulk load and manual deletion of edges in single transaction. Sorry what did I miss when you said I did not read the question ? – PrashantUpadhyay Dec 01 '21 at 17:52
  • There are the steps I am doing now to start with and see how it goes on a daily basis. 1. Take a backup of all existing edges based on "from" from the vertex. 2. Delete those existing edges 3. Bulk load all the edges from incoming file. 4. If step 3 fails generate a notification for user and reload all the edges from backup files. 5. If step 4 also fails due to any reason generate a notification for Admin. Also there will be an UI where Admin can go and manually upload the backup files when step 4 fails. – user2026504 Dec 02 '21 at 15:44