Load a huge CSV with multiple relationship types

Question

I am trying to load a CSV file into a Neo4j database where the file contains different types of edges between nodes. I would like to load all the different types of edges from one file in one query (as opposed to breaking the file into separate files each containing a different type of edge). For instance:

Source|Target|Relationship
x1|y1|Creates
x2|y2|Uses
x3|y1|Uses

Cypher does not like the following load query:

LOAD CSV WITH HEADERS FROM 'file:///filename.csv' AS line FIELDTERMINATOR '|'
MERGE (x:Node {name: line.Source})
MERGE (y:Node {name: line.Target})
CREATE (x)-[:line.Relationship]->(y)

As suggested here, I can use APOC instead as the following:

LOAD CSV WITH HEADERS FROM 'file:///filename.csv' AS line FIELDTERMINATOR '|'
MERGE (x:Node {name: line.Source})
MERGE (y:Node {name: line.Target})
CALL apoc.create.relationship(x, line.relationship, y) YIELD rel
RETURN *

However, this performs very slowly when run on a large scale (50,000) compared to the first example, and I suspect it is related to YIELD rel RETURN *. I am using Neo4j's .NET driver, and executing this query, returns a list of all the edges it has created.

Naively dropping YIELD or RETURN results in errors such as the following (see this for some explanation):

Query cannot conclude with CALL together with YIELD

So, I was wondering how best I can improve the above query, ideally without having to return or yield any of the results.

score 0 · Answer 1 · edited Jun 20 '22 at 15:42

0

You can simply return a simple string:

LOAD CSV WITH HEADERS FROM 'file:///filename.csv' AS line FIELDTERMINATOR '|'
MERGE (x:Node {name: line.Source})
MERGE (y:Node {name: line.Target})
CALL apoc.create.relationship(x, line.relationship, {}, y) YIELD rel
RETURN distinct 'done'

What you can also do is to use PERIODIC IMPORT to batch your transactions with large CSVs

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///filename.csv' AS line FIELDTERMINATOR '|'
MERGE (x:Node {name: line.Source})
MERGE (y:Node {name: line.Target})
CALL apoc.create.relationship(x, line.relationship, {}, y) YIELD rel
RETURN distinct 'done'

edited Jun 20 '22 at 15:42

Dr. Strangelove

2,725
3
34
61

answered Jun 20 '22 at 07:48

Tomaž Bratanič

6,319
2
18
31

Would not `yield` result in creating the list of relationships, even though you're not returning them to the caller. – Dr. Strangelove Jun 20 '22 at 15:42
Yes, but that is not a problem – Tomaž Bratanič Jun 20 '22 at 16:29
could you please elaborate? e.g., suppose I've 1 billion relationships, I suppose `yield`ing them all stores them all in the memory, hence the machine running Neo4j requires to have sufficient amount of memory to fit them all. Am I missing something? – Dr. Strangelove Jun 20 '22 at 16:51
You couldn't really use this for importing billion relationships... but that's why you use batching so you don't run out of memory – Tomaž Bratanič Jun 20 '22 at 17:10

Load a huge CSV with multiple relationship types

1 Answers1