I am working with an application that uses a Neo4J graph containing about 10 million nodes. One of the main tasks that I run daily is the batch import of new/updated nodes into the graph, on the order of about 1-2 million. After experimenting with Python scripts in combination with the Cypher query language, I decided to give the embedded graph with Java API a try in order to get better performance results.
What I found is about a 5x improvement using the native Java API. I am using Neo4j 2.1.4, which I believe is the latest. I have read in other posts that the embedded graph is a bit faster, but that this should/could be changing in the near future. I would like to validate my findings with anyone who has observed similar results?
I have included snippets below just to give a general sense of methods used - code has been greatly simplified.
sample from cypher/python:
cnode = self.graph_db.create(node(hash = obj.hash,
name = obj.title,
date_created = str(datetime.datetime.now()),
date_updated = str(datetime.datetime.now())
))
sample from embedded graph using java:
final Node n = Graph.graphDb.createNode();
for (final Label label : labels){
n.addLabel(label);
}
for (Map.Entry<String, Object> entry : properties.entrySet()) {
n.setProperty(entry.getKey(), entry.getValue());
}
Thank you for your insight!