I'm trying to insert data from my SQL db into Neo4J. I have a CSV file where every row generates 4-5 entities and some relations between them. Entities might be duplicate between rows and I want to force uniqueness.
What I currently do is:
- create constraints for each label to force uniqueness.
- iterate the CSV:
- start transaction
- create merge statements for the entities
- create merge statements for the relations
- commit transaction
I got bad results. Then I tried to commit the transaction every X rows (X was 100, 500, 1000 and 5000). It's better now but I still have 2 problems:
- it's slow. on average around 1-1.5 seconds per 100 rows. (row = 4-5 entities and 4-5 relations).
- it's getting worse as I keep adding data. I usually start with 400-500 ms per 100 rows and after ~5000 rows I'm at ~4-5 seconds per 100 rows.
From what I know, my constraint also creates an index for that field. That's the field that is used when I create the new node with MERGE. Any chance it doesn't use the index?
What's the best practice for improving performance? I saw BatchInserter but wasn't sure if I can use it with MERGE operations.
Thanks