Create triadic closures in Neo4j between hundreds of thousands nodes using Cypher does not work

Question

I got a problem creating triadic closures for a huge amount of nodes and relationships. I used to search for an answer for hours, but nothing really matched my problem.

The dataset:

322276 nodes with label PERSON (with index on property name)
987052 nodes with label PRODUCTION
6417928 relationships with label PLAYS
14314487 relationships with label WORKS

The nodes are connected as follows:

(:PERSON)-[:PLAYS]->(:PRODUCTION)
(:PERSON)-[:WORKS]->(:PRODUCTION)

I want to create triadic closures between all persons, that means connect two persons that worked on / played in the same production with a new edge with label [:WORKED_IN]. To do so I wrote the following query:

MATCH (p1:PERSON)
-[:WORKS|PLAYS*2..2]-
(p2:PERSON)
WHERE p1<>p2
CREATE UNIQUE (p1)-[:WORKED_WITH]->(p2);

Instead of CREATE UNIQUE I tried to MERGE and to use WHERE NOT (p1)-[:WORKED_WITH]->(p2). The problem is that even after 7 hours it does not finish... I know this is a huge amount of data, but I hope there is different way to have this much quicker...

Do you have any idea what to do?

Some more information:

Neo4j 3.1.4 Community Edition
Windows 10
Quad Core i5
8GB RAM DDR3
located on a SSD drive
I did not change the default config of neo4j

I also thought about trying to use the traversal API, but I don't know how to do this (and also if this would help)... I already read some books from Michael Hunger, Vukotic/Watt, Panzarino, etc., studied the official docs and read many answers on stackoverflow, but did not find useful information. I hope you can help me.

Best Wishes, Wolfgang

(1) When you used `MERGE (p1)-[:WORKED_WITH]->(p2)`, you should NOT have also used `WHERE NOT (p1)-[:WORKED_WITH]->(p2)`, since `MERGE` automatically does the same test, duplicating the effort. (2) Do you actually *need* the redundant `WORKED_WITH` relationship, which requires much more code complexity and storage in order to add those relationships whenever related changes are made to the DB? Couldn't you just use something like your existing `MATCH` and `WHERE` clauses to find who worked with whom as needed? — cybersam, Jun 21 '17 at 05:40
Well, I also tried with merge and without that where part (taking also many days). I do want to do such things like clustering. There I need for every pair of Nodes a similarity (e.g. jaccard)... It is just not performing if I don't create those edges (with a weight)... — Wolfgang G., Jun 21 '17 at 05:52

Tomaž Bratanič · Accepted Answer · 2017-06-01T11:04:06.910

2

When doing refactoring or updating a big graph you want to use batching. The Apoc library provides such an option with apoc.periodic.

That would look in your example like:

call apoc.periodic.commit("
MATCH (p1:PERSON)-[:WORKS|PLAYS*2..2]-(p2:PERSON)
WHERE id(p1) < id(p2) and NOT (p1)-[:WORKED_WITH]-(p2)
with p1,p2 limit {limit}
MERGE (p1)-[:WORKED_WITH]-(p2);
RETURN count(*)
",{limit:5000})

edited Jun 01 '17 at 11:04

answered Jun 01 '17 at 10:47

Tomaž Bratanič

6,319
2
18
31

Thank you really much! I read about the Apoc library in an answer of Michael Hunger, but I did not know it can commit periodicly. I have to admit that I thought about using Apoc to export as CSV and import it using Cypher, but I thought that maybe there is a way provided by neo4j itself... Am I right that there isn't? – Wolfgang G. Jun 01 '17 at 10:54
no for now i havent seen any good native cypher batching options. `apoc.periodic.iterate` and `apoc.periodic.commit` fit nicely for batching – Tomaž Bratanič Jun 01 '17 at 10:58
Thank you very much for your support. I will try that and share my result. – Wolfgang G. Jun 01 '17 at 11:01
The apoc.periodic.commit works fine for my issue, although it takes much time (but this is not surprising due to my graph). Thank you again! – Wolfgang G. Jun 02 '17 at 07:06

score 1 · Answer 2 · answered Jun 21 '17 at 04:33

Just if someone will ever read this question, using the apoc library did not solve my problem satisfying... It would have taken about a month.

So I decided to run a SQL-Statement to export the triadic closures to a CSV-file (in Neo4j I created unique constraints on the IDs of the SQL tables). Those CSV data is easily imported via bulk loading. All together took me less than 3 hours, so I would say it is the most efficient way to solve this.

Create triadic closures in Neo4j between hundreds of thousands nodes using Cypher does not work

2 Answers2