How to add edges(relationship) Neo4j in a very big graph

Question

I have a simple graph model. In this graph, Each node has an attribute {NodeId}. Each Edge will link two nodes without other attributes. It is an directed graph and has about 10^6 nodes.

Here is my situation: I created index on attribute {NodeId} at first.Then I created 10^6 nodes. In this time, I have a graph with 10^6 nodes and no edges. When I want to randomly add edges, I found that the speed is very slow. I can only add about 40 edges per second.

Did I miss any configurations? I don't think this is a reasonable speed.

The Code for adding edges:

public static void addAnEdge(GraphClient client, Node a, Node b)
    {
        client.Cypher
        .Match("(node1:Node)", "(node2:Node)")
        .Where((Node node1) => node1.Id == a.Id)
        .AndWhere((Node node2) => node2.Id == b.Id)
        .Create("node1-[:Edge]->node2")
        .ExecuteWithoutResults();
    }

Should I add index on edges? If so, How to do it in neo4jClient? Thanks for your help.

Batch all my queries into one transaction is a good ieal. I execute following statement in my browser(http://localhost:7474):

MATCH (user1:Node), (user2:Node)
WHERE user1.Id >= 5000000 and user1.Id <= 5000100 and user2.Id >= 5000000 and user2.Id <= 5000100
CREATE user1-[:Edge]->user2

In this statement I create 10000 edges in one transaction. So I think the http overhead is not so serious now. The result is:

Created 10201 relationships, statement executed in 322969 ms.

That means I add 30 edges per second.

Are you issuing the same request every time ? If yes, then you have the http overhead on every edge creation. You should batch your statements in transactions — Christophe Willemsen, Nov 02 '15 at 10:58
I don't know anything about neo4jclient, but in general you could try to group queries in transactions: https://github.com/Readify/Neo4jClient/wiki/Transactions — Martin Preusse, Nov 02 '15 at 10:58
Thanks for your reply. I want to provide an interface that adding one edge. So I don't want to batch them into one transaction. Is the http overhead so serious if I use the local Neo4j? @ChristopheWillemsen — IamVeryClever, Nov 02 '15 at 11:28
I tried another way to reduce the http overhead. Could you please see my modification? Thank you @ChristopheWillemsen — IamVeryClever, Nov 02 '15 at 11:47

score 3 · Accepted Answer · edited Nov 02 '15 at 15:45

The ideal solution is to pass pairs of nodes to be related in one parameters map, then with UNWIND you can iterate those pairs and create the relationship, this is really performant as long as you have an index on the Id property of the Node nodes.

I don't know how you can do it with Neo4jClient, but here is the Cypher statement :

UNWIND {pairs} as pair
MATCH (a:Node), (b:Node)
WHERE a.Id = pair.start AND b.Id = pair.end
CREATE (a)-[:EDGE]->(b)

The parameters to be sent along with the query should have this form :

{
  "parameters": {
    "pairs": [
      {
        "start": "1",
        "end": "2"
      },
      {
        "start": "3",
        "end": "4"
      }
    ]
  }
}

UPDATE

The Neo4jClient author kindly gave me the equivalent code in Neo4jClient :

var parameters = new [] {
       new {start = 1, end = 2},
       new {start = 3, end = 4}
   };

   client.Cypher
       .Unwind(parameters, "pair")
       .Match("(a:Node),(b:Node)")
       .Where("a.Id = pair.start AND b.Id = pair.end")
       .Create("(a)-[:EDGE]->(b)")
       .ExecuteWithoutResults();

Thanks for your help! It's my first question in stackoverflow and I appreciate all you guys. Thank you for your Neo4jClient solution! — IamVeryClever, Nov 03 '15 at 03:12
Do you have to use `UNWIND` here or is it possible to reformulate the query and have Cypher go over a list automatically (as in http://neo4j.com/docs/stable/cypher-parameters.html#_create_multiple_nodes_with_properties)? — Martin Preusse, Dec 01 '15 at 16:13

score 0 · Answer 2 · edited May 23 '17 at 12:04

In your updated Cypher query, you MATCH a cartesian product of all your nodes. That is very slow. Have a look at the EXPLAIN of your query.

And see this question for an explanation how to deal with cartesian products: Why does neo4j warn: "This query builds a cartesian product between disconnected patterns"?

Do you have an index on the Id property? Ideally, you should use a uniqueness constraint. This automatically adds a very fast index.

In your query, try to first MATCH the first nodes, use WITH to collect them in a list and then MATCH the second batch of nodes:

MATCH (user1:Node)
WHERE user1.id >= 50000 and user1.id <= 50100
WITH collect(user1) as list1
MATCH (user2:Node)
WHERE user2.id >= 50000 and user2.id <= 50100
UNWIND list1 as user1
CREATE (user1)-[:EDGE]->(user2)

Sorry for my late reply. Thanks for your help! that's really helpfull because I didn't realize my statement will do a cartesian product of all nodes. I thought Neo4j will do some optimization on it... And thanks for your references! That's helps me a lot. — IamVeryClever, Nov 03 '15 at 03:08

How to add edges(relationship) Neo4j in a very big graph

2 Answers2

Linked