I got a problem creating triadic closures for a huge amount of nodes and relationships. I used to search for an answer for hours, but nothing really matched my problem.
The dataset:
- 322276 nodes with label PERSON (with index on property name)
- 987052 nodes with label PRODUCTION
- 6417928 relationships with label PLAYS
- 14314487 relationships with label WORKS
The nodes are connected as follows:
- (:PERSON)-[:PLAYS]->(:PRODUCTION)
- (:PERSON)-[:WORKS]->(:PRODUCTION)
I want to create triadic closures between all persons, that means connect two persons that worked on / played in the same production with a new edge with label [:WORKED_IN]. To do so I wrote the following query:
MATCH (p1:PERSON)
-[:WORKS|PLAYS*2..2]-
(p2:PERSON)
WHERE p1<>p2
CREATE UNIQUE (p1)-[:WORKED_WITH]->(p2);
Instead of CREATE UNIQUE I tried to MERGE and to use WHERE NOT (p1)-[:WORKED_WITH]->(p2). The problem is that even after 7 hours it does not finish... I know this is a huge amount of data, but I hope there is different way to have this much quicker...
Do you have any idea what to do?
Some more information:
- Neo4j 3.1.4 Community Edition
- Windows 10
- Quad Core i5
- 8GB RAM DDR3
- located on a SSD drive
- I did not change the default config of neo4j
I also thought about trying to use the traversal API, but I don't know how to do this (and also if this would help)... I already read some books from Michael Hunger, Vukotic/Watt, Panzarino, etc., studied the official docs and read many answers on stackoverflow, but did not find useful information. I hope you can help me.
Best Wishes, Wolfgang