I am trying to parallelize a graph creation query using MERGE using multiple processes, in py2Neo. Here is the request:
def add_edge(start_node_name, start_type, end_node_name, end_type):
statement = """MERGE (A:Entity{Name:{Start}, Type:{T_start}})
MERGE (B:Entity{Name:{End}, Type:{T_end}})
MERGE (A)-[r:LINKED_TO]->(B) ON CREATE SET r.cnt = 1 ON MATCH SET r.cnt = r.cnt + 1"""
trans_action.run(statement, {"Start": start_node_name, "End": end_node_name, "T_start": start_type, "T_end": end_type})
Each process is executing this same function many time, for each node it encounters. The nodes "start_name" or "end_name" can be duplicated, this is why I am using MERGE.
In my understanding, using MERGE prevent from creating the node multiple time: if it matches an already existing one it just pick it and continues. However, I just received an error from py2neo stating that:
py2neo.database.ClientError: ConstraintValidationFailed: Node(411606) already exists with label `Entity` and property `Name` = 'INSA RENNES'
This should not happen, since if this was true, the node would just been matching it would've just used it. The only reason I see is that transactions does not guaratee atomicity and that two processes saw that the node did not exist and thus tried to create it at the same time. This again is not possible from my understanding, as transations SHOULD be atomic. Unless this is an issue from py2neo, but from the error message, it rather looks like a Neo one.