Create relationship on Neo4j using CSV files

Question

I want to create a simple DB using some CSV files, like this: attore.csv, film.csv, recita.csv.

I created successfully the nodes with the label Attore and Film, simple files like this:

attore.csv:

    nome
    nome1
    nome2
    nome3

film.csv

    titolo
    titolo1 
    titolo2 
    titolo3

and I was trying to create the relationship between them using recita.csv, in which each row is:

attore, film

Obv my primary key should be Attore(nome) and Film(titolo). I've been looking for so much time, I found many codes but no one is working, every try I made just run for something like an hour.

This is what I did:

I created the film nodes:

USING PERIODIC COMMIT 
LOAD CSV WITH HEADERS FROM "file:///film.csv" AS row
CREATE (n:Film) 
SET n = row, n.titolo = (row.titolo), n.durata = (row.durata), 
n.genere = (row.genere), n.anno = (row.anno), n.descrizione = 
(row.descrizione), n.regista = (row.regista), 
n.studio_cinematografico = (row.studio_cinematografico)

Then I created the attore nodes:

USING PERIODIC COMMIT 
LOAD CSV WITH HEADERS FROM "file:///attore.csv" AS row
CREATE (n:Attore) 
SET n = row, n.nome = (row.nome)

And then, after so much try I thought this was the exact way to create relationship, but didn't work:

USING PERIODIC COMMIT 
LOAD CSV WITH HEADERS FROM "file:///recita.csv" AS row
MATCH (attore:Attore {nome: row.attore})
MATCH (film:Film {titolo: row.film})
MERGE (attore)-[:RECITA]-(film);

I hope that someone could tell me the right way to create relationship, thanks.

EDIT: Examples of how are structured my files attore.csv:

nome
Brendan Fraser
Bett Granstaff
Leslie Nielsen
Martina Gedeck
Martin Sheen

film.csv:

titolo   durata   genere   anno   descrizione   regista   studio_cin
Mortdecai    80    Action  2015   *something*   David Koepp  Liongate

recita.csv:

attore       film
Johnny Depp   Mortdecai
Jason Momoa   Braven

can you verify Film and Attore nodes are created successfully with the required properties? — Rajendra Kadam, Aug 25 '19 at 05:06
Also please share few records for each file along with the header, otherwise it's really difficult to tell anything — Rajendra Kadam, Aug 25 '19 at 05:07
And if I try to create nodes using @ArtemNazarenko code (using merge), I obtain the same of when I try to create relationship, i.e. it runs and never terminates :/ — simone di tanna, Aug 25 '19 at 11:44
Do you see the Film and Attore nodes are created successfully in the database? — Rajendra Kadam, Aug 25 '19 at 11:58
Yes, it created exactly 14428 nodes with label Film, the number of rows of film.csv and 34852 nodes with label Attore, the number of rows of attore.csv, so it should be all right with nodes. Can you confirm that the command that I launch for the creation on relationships is correct? Because if it is, and if the nodes are okey, maybe I just have to let the command in execution for hours. Maybe it's a so so so expensive operation? I don't know what else could be — simone di tanna, Aug 25 '19 at 12:36

Artem Nazarenko · Answer 1 · 2019-08-24T20:09:11.120

0

Instead of the approach you are using. I would recommend to use Merge instead of Create, in this way you can avoid repetitions:

USING PERIODIC COMMIT 
LOAD CSV WITH HEADERS FROM "file:///attore.csv" AS row
MERGE (a:Attore{nome: row.nome})
RETURN a

the same is applied for the film.csv just separate properties with comma.

Second considering your csv docs format, check again the .csv format documentation. From what you have explained and if you want to make your code working, you need to have just two columns in your recita.csv (attore, film) and not 6 as you have (attore, film attore, film attore, film), because they are identical, but the column identifier (name) should be unique you don't need to repeat attore and film 3 times.

Please check the headers of all your files or expand your question with examples of your csv's.

Try to change your recita.csv file according to csv format requirements.

edited Aug 24 '19 at 20:09

answered Aug 24 '19 at 19:57

Artem Nazarenko

71
8

Sorry, I wasn't so clear in my post. Obviously recita.csv has just two columns, attore and film, I meant that attore, film attore, film were couples. So I can't understand what is the error. Use of merge in nodes creation doesn't change the problem – simone di tanna Aug 24 '19 at 22:24
The query you provided, where you create the relationship, works well. The problem you suggested about the number of nodes and relations which need to be created, try to do the following, just leave 100 lines in recita.csv file and try to create relations, then execute something like to check, if relations are created RETURN (attore)-[:RECITA]-(film); – Artem Nazarenko Aug 25 '19 at 13:04
The idea is to check, if the problem is exactly in the number of nodes and relations – Artem Nazarenko Aug 25 '19 at 13:05
Ok, I did it and 100 relationships have been created correctly. So? It's just my pc that, with 80k rows in recita.csv (more or less), can't support this execution? Now I will try to let it executing for some hours, if doesn't end I will try on another pc, more performing. Have you another idea? – simone di tanna Aug 25 '19 at 13:38
Check the link here you have some tests with time and file size: https://dzone.com/articles/half-terabyte-benchmark-neo4j-vs-tigergraph – Artem Nazarenko Aug 25 '19 at 14:33
you have smaller size, in terms of edges and vertices, so probably let the Neo4j for an hour to perform the query or, as you said, try later on more performing PC. – Artem Nazarenko Aug 25 '19 at 14:35

Create relationship on Neo4j using CSV files

1 Answers1