Neo4j Server vs. Embedded

Question

I'am a little bit confused about what is the best solution for my application. As I've seen so far, I have to choose between neo4j standalone (RestGraphDatabase) and an EmbeddedGraphDatabase (the RemoteGraphDatabase is not for production use yet).

Pros REST:

-> Different services can access the neo4j DB (sample: i have one service that is responsible for Nodes of kind A,B and C. The second service is responsible for nodes D and H and can connect D-nodes to A-nodes). In that way i have clean domain structures. Every service is only responsible for its own domain nodes. I can update each service and don't have to shutdown my whole application.

-> I can access the neo4j DB from different languages (PHP)

Cons: - Performance is not that good as an EmbeddedGraphDatabase (since the neo4j server and the services are on the same machine the latency is not that big). - No transactions

My questions: Is this a good decision to go with the standalone server? Or should I use the embedded one and mix up the services into a big one? Is it possible to run a big (complex) application without transaction support?

score 9 · Accepted Answer · answered Nov 22 '11 at 14:46

You're correct that performance with the REST server will be less. However, you can have something like transactions with the REST server using batch operations; see http://docs.neo4j.org/chunked/milestone/rest-api-batch-ops.html. You can also build domain specific server plugins that perform your transactional logic on the server side: http://docs.neo4j.org/chunked/milestone/server-extending.html.

If your architecture requires that you be able to access the database from multiple client machines, your only options are the REST server or Neo4j HA (High Availability). HA is only available with a Neo4j Enterprise license.

Let application architecture inform which tools are used, not the other way around. If you've already decided that your application is best as separate services, don't combine them into one just to support the underlying persistence model. I don't know anything about your application, but from your description, I would choose the REST server and utilize batches or server plugins.

I'd like to add that the REST-API (tested with two Python libs) has severe performance issues with large data sets (we were importing 10 GB, so not even a really huge data set). We used the batch importer but after a certain limit, the server almost blocks. There are open discussions about that problem, but I am not aware of a solution yet. In general I would recommend the embedded setting for all heavy lifting. — Bouncner, Jan 14 '13 at 17:52
@Bouncner Three years on, do you know if this is still the case? Around the same time as you we also noticed this performance issue, but haven't used it since. — Spencer Kormos, Dec 18 '15 at 02:04

score 7 · Answer 2 · answered Nov 23 '11 at 09:29

It all depends on your use-case. You already listed some of the pro's and con's.

One other pro for the server is the web-admin / visualization.

You have some more options. You can have an embedded graphdb for high performance and have only some services run embedded, and use a custom, domain centric remote (REST or otherwise) API to expose the graph database for other services.

The same can be achieved by using the Neo4j Server and add some of the more performance critical services as Server-Plugins or Extensions which are also able to expose a custom remote API that suits your use-cases probably better.

I would start using the embedded graph db for developing your services, if you want to expose certain endpoints to other services later, it is quite easy to switch to the Neo4j server.

In the REST-API there is one transaction per request, for larger operations there is a batch operation in the API.

Neo4j Server vs. Embedded

2 Answers2

Linked