I'm currently using py2neo to interface with my neo4j server. One thing that I'd like to do is enforce a uniqueness constraint for a label (i.e. enforce a unique client-generated hash on the server side). For the sake of example, I have the following schema:
ON :Organization(uid) ONLINE (for uniqueness constraint)
Since I'm using py2neo, my normal node creation sequence usually entails:
- Generate the UID hash based on properties of the organization
- Add it to the database
- Add the "Organization" label to the hydrated node returned by the add statement
This works just fine. When I go to create a duplicate node, I:
- Generate the UID hash based on properties of the organization
- Add it to the database
- Attempt to add the "Organization" label, which fails due to the uniqueness constraint.
The problem with the step above is that I now have a label-less duplicate node on my graph. Instead I'd like to get a reference to the existing node since this is usually executed within the context of relationship creation. To accomplish this, I need to be able to create the node and label it before adding it to the graph, which currently cannot be done cleanly with py2neo/the REST API. I can't use the batch API as that fails with the same error (and doesn't return a copy of the existing node).
A workaround is:
- Generate the UID hash based on properties of the organization
- Query the database for a node with that hash
- If it exists, use that, otherwise add a node to the database and then add the "Organization" label to the node.
The downside of that is I'm performing extra network requests as well as avoidable I/O. The Cypher analogue I'm looking for is MERGE. It seems as if I have two or three options here:
- Instead of using the standard graph create operation, I convert the node abstract to a Cypher MERGE statement and execute that.
- Fall back to the "legacy" indexing system which provides a get_or_create method.
The legacy indexing system also seems to provide a better short-term outlook in that I can create full text indices, and it seems as if I also get better performance out of it. Any thoughts/suggestions?