I am using Neo4j Server 2.3.2 version and using Unmanaged plug-ins to parse and load unstructured data into graph.
While doing it I landed into a situation of duplicate nodes and slow throughput(even with batch processing) for sequential processing. Considering the use cases of repeated data loads, seeking parallel data processing(using Server Plugins) using either:
Split files
Split process /Threading
Now, with parallel date loads, the biggest challenge I can foresee - is to dealing with the data integrity issues like how to avoid creating duplicate nodes. Reading Neo4j’s reference materials I found the following options of creating an unique nodes.
Options:
[Preferred] Get or create unique node using Cypher and unique constraints
[Other] Pessimistic locking from Java API
[Other] Get or create unique node using a legacy index
Now, the question is from where I can enforce this unique constraint using java API. I am seeking some event information which I can capture and enforce the constraints. Please advise what is the best way to load/invoke some events/methods and there we inject the code for enforcing constraints.
Also, is there any way to define this unique enforcement using Cipher well before the any entity is created.
Thanks in advance
References: