Which indexing system should I use?

Question

I'm currently using py2neo to interface with my neo4j server. One thing that I'd like to do is enforce a uniqueness constraint for a label (i.e. enforce a unique client-generated hash on the server side). For the sake of example, I have the following schema:

ON :Organization(uid)    ONLINE (for uniqueness constraint)

Since I'm using py2neo, my normal node creation sequence usually entails:

Generate the UID hash based on properties of the organization
Add it to the database
Add the "Organization" label to the hydrated node returned by the add statement

This works just fine. When I go to create a duplicate node, I:

Generate the UID hash based on properties of the organization
Add it to the database
Attempt to add the "Organization" label, which fails due to the uniqueness constraint.

The problem with the step above is that I now have a label-less duplicate node on my graph. Instead I'd like to get a reference to the existing node since this is usually executed within the context of relationship creation. To accomplish this, I need to be able to create the node and label it before adding it to the graph, which currently cannot be done cleanly with py2neo/the REST API. I can't use the batch API as that fails with the same error (and doesn't return a copy of the existing node).

A workaround is:

Generate the UID hash based on properties of the organization
Query the database for a node with that hash
If it exists, use that, otherwise add a node to the database and then add the "Organization" label to the node.

The downside of that is I'm performing extra network requests as well as avoidable I/O. The Cypher analogue I'm looking for is MERGE. It seems as if I have two or three options here:

Instead of using the standard graph create operation, I convert the node abstract to a Cypher MERGE statement and execute that.
Fall back to the "legacy" indexing system which provides a get_or_create method.

The legacy indexing system also seems to provide a better short-term outlook in that I can create full text indices, and it seems as if I also get better performance out of it. Any thoughts/suggestions?

score 2 · Accepted Answer · answered Jun 02 '14 at 21:28

2

I'd say use MERGE, which also does the correct locking and guarantees the uniqueness of your node.

The uniqueness check is imho done immediately, not sure about the visibility of changes of other threads performing operations at the same time. MERGE takes an index lock and makes sure only one thread at a time checks the uniqueness constraint.

answered Jun 02 '14 at 21:28

Michael Hunger

41,339
3
57
80

Thanks! I'll go with this since it seems like the least drastic change to my existing code. This also leaves the window open for the next major release of py2neo which should address labels. – Mr. S Jun 03 '14 at 21:09

score 1 · Answer 2 · answered Jun 03 '14 at 09:14

The next version of py2neo (1.7) will be able to handle this sort of situation more fluidly. I'm currently building functionality to separate client-side entity manipulation (e.g. labels and properties) from client-server synchronisation. This means it will be possible to create a node within an application and then push it to the server in a single HTTP request.

The code will look something like:

from py2neo import Graph, Node

graph = Graph()

# Define a node client-side with a label and a property
node = Node("Person", name="Alice")

# Create the node on the server
# (this will bind the client-side node to a new server node)
graph.create(node)

# Make a few changes (client-side)
node.labels.add("Employee")
node.properties["employee_no"] = 42

# Push the changes to the bound server-side entity
node.push()

Note that the code here is only an example and may change before release!

Thanks for the information (and the useful library)! I'll see if I can find some time to contribute to the 1.7 milestone to speed up its arrival. — Mr. S, Jun 03 '14 at 21:12

Which indexing system should I use?

2 Answers2