3

According to my benchmark of creating nodes using

GraphClient.Create()

performance leaves much to be desired. I've got about 10 empty nodes per second on my machine (Core i3, 8 GB RAM).

Even when I use multithreading to perform create time to each Create() call speed icreases linearly (~N times when used N threads).

I've tested both stable 1.9.2 and 2.0.0-M04. The results exactly the same.

Does anybody know what's wrong?

EDIT: I tried to use neo4j REST API and I got similar results: ~ 20 empty nodes per second and multithreading also gives no benefits.

EDIT 2: At the same time Batch REST API, that allows batch creations provides much better performance: about 250 nodes per second. It looks like there is incredible big overhead in handling single request...

Eugene D. Gubenkov
  • 5,127
  • 6
  • 39
  • 71

2 Answers2

4

Poor performance caused by overhead in processing RESTful Cypher query. Mostly it is network overhead but overhead caused by need to parse query also exists.

Use Core Java API when you interested in high performance. Core Java API provides more than 10 times faster requests processing than Cypher query language.

See this articles:

Eugene D. Gubenkov
  • 5,127
  • 6
  • 39
  • 71
1

The neo4jclient itself uses the REST API, so you're already limited in performance (by bandwidth, network latency etc) when compared to a direct API call (for which you'd need Java).

  • What performance are you after?
  • What code are you running?

Some initial thoughts & tests to try:

Obviously there are things like CPU etc which will cause some throttling, some things to consider:

  1. Is the Neo4J server on the same machine?
  2. Have you tried your application not through Visual Studio? (i.e. no debugging)

In my test code (below), I get 10 entries in ~200ms - can you try this code in a simple console app and see what you get?

private static void Main()
{
    var client = new GraphClient(new Uri("http://localhost.:7474/db/data"));
    client.Connect();

    for (int i = 0; i < 10; i++)
        CreateEmptyNodes(10, client);
}

private static void CreateEmptyNodes(int numberToCreate, IGraphClient client)
{
    var start = DateTime.Now;
    for (int i = 0; i < numberToCreate; i++)
        client.Create(new object());

    var timeTaken = DateTime.Now - start;
    Console.WriteLine("For {0} items, I took: {1}ms", numberToCreate, timeTaken.TotalMilliseconds);
}

EDIT:

This is a raw HttpClient approach to calling the 'Create', which I believe is analagous to what neo4jclient is doing under the hood:

private async static void StraightHttpClient(int iterations, int amount)
{
    var client = new HttpClient {BaseAddress = new Uri("http://localhost.:7474/db/data/")};

    for (int j = 0; j < iterations; j++)
    {
        DateTime start = DateTime.Now;
        for (int i = 0; i < amount; i++)
        {
            var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Post, "cypher/") { Content = new StringContent("{\"query\":\"create me\"}", Encoding.UTF8, "application/json") });
            if(response.StatusCode != HttpStatusCode.OK)
                Console.WriteLine("Not ok");
        }
        TimeSpan timeTaken = DateTime.Now - start;
        Console.WriteLine("took {0}ms", timeTaken.TotalMilliseconds);
    }
}

Now, if you didn't care about the response, you could just call Client.SendAsync(..) without the await, and that gets you to a spiffy ~2500 per second. However obviously the big issue here is that you haven't necessarily sent any of those creates, you've basically queued them, so shut down your program straight after, and chances are you'll have either no entries, or a very small number.

So.. clearly the code can handle firing x thousand calls a second with no problems, (I've done a similar test to the above using ServiceStack and RestSharp, both take similar times to the HttpClient).

What it can't do is send those to the actual server at the same rate, so we're limited by the windows http stack and / or how fast n4j can process the request and supply a response.

Charlotte Skardon
  • 6,220
  • 2
  • 31
  • 42
  • I've ran your code: in average it (creating 10 nodes) takes ~400ms. This application generates ~40 KB/s of network activity - I don't think that network throughput is the bottleneck. I don't see bottlenecks here at all, but it is so slow... And when I use 4 thread every 10 nodes takes ~4 * 400 ms. – Eugene D. Gubenkov Aug 23 '13 at 11:20
  • See this link: http://comments.gmane.org/gmane.comp.db.neo4j.user/11002. They are talking about more than 500 nodes per second. And they also use REST API – Eugene D. Gubenkov Aug 23 '13 at 11:38
  • I've added an example (above) using the HttpClient which is I believe the way n4jclient does it under the hood. I don't know how they are achieving 500 nodes a second, certainly with my (limited) knowledge of direct rest based communication in .net it doesn't seem obvious. I suspect Tatham will be able to shed light on the issue, and explain it better than I have. – Charlotte Skardon Aug 23 '13 at 15:32
  • as it figured out poor performance caused by overhead in processing Cypher query. See this article: http://www.rene-pickhardt.de/get-the-full-neo4j-power-by-using-the-core-java-api-for-traversing-your-graph-data-base-instead-of-cypher-query-language/ – Eugene D. Gubenkov Aug 25 '13 at 17:42