create and infer relationships based on ID when reading in a csv in neo4j

Question

I have a simple csv, with 4 lines, that looks like this:

+------------+-------------+------------+-------------+
|     ID     |      Name   |  FatherID  |  MotherID   |
+------------+-------------+------------+-------------+
| 1          | Mom Doe     |            |             |
| 2          | Dad Doe     |            |             |
| 3          | Big Sis     |  2         |  1          |
| 4          | Lil Bro     |  2         |  1          |
+------------+-------------+------------+-------------+

I am trying to make a family tree, which looks like this:

The trick here is that I have to create these relationships based on only two things: the FatherID and the MotherID. Which is doeable. But it requires applying a relationship in some conditional way.

Here's what I tried, which didn't work:

LOAD CSV WITH HEADERS FROM
'file:///Users/.../import_for_Neo4j.csv' AS line
WITH line
CREATE (person:Person {id:line.ID})
SET person.Name=line.Name,
    person.MotherID=line.MotherID,
    person.FatherID=line.FatherID
WITH person
CREATE (a:Person {Name:'Mom Doe'})-[:SPOUSE]->(b:Person {Name:'Dad Doe'})
RETURN a

But then I realized, that even if this did actually work, what would be the point? I'd have to go in an hand-type the names for every family member, which would negate the whole point of loading a csv in the first place. If that was the case, I may as well just type everything up in Sublime by hand and skip reading a csv.

One idea I had was to take anyone who has a NULL for FatherID and MotherID become related as SPOUSE, but that wouldn't work if the family tree had grandparents.

A solution seems really tricky -- maybe first create all the nodes and create the SIBLING relationships. Then iterate over the csv and produce the CHILD relationships?

Is there any way I can generally slurp up a csv and create this simple graph?

Thanks for reading this.

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

The SIBLING relationship is unneeded, since you can determine the sibling relationship just by matching persons who share a parent.

Here is an approach that might work for your needs.

Note: I chose to use FATHER and MOTHER relationships instead of CHILD, to retain that semantic information from the CSV file. You can simplify my answer to just use CHILD if you choose.

Step 1: Generate all the `Person` nodes

LOAD CSV WITH HEADERS FROM 'file:///Users/.../import_for_Neo4j.csv' AS line
MERGE (p:Person {id: line.ID, name: line.Name});

Note that I use MERGE instead of CREATE, to avoid creating duplicates.

Step 2: Generate all the relationships (FATHER, MOTHER, SPOUSE)

LOAD CSV WITH HEADERS FROM 'file:///Users/.../import_for_Neo4j.csv' AS line
MATCH (p:Person {id: line.ID})
WITH p, line
OPTIONAL MATCH (m:Person {id: line.MotherID})
FOREACH (x IN CASE WHEN m IS NULL THEN [] ELSE [1] END | MERGE (p)-[:MOTHER]->(m))
WITH p, m, line
OPTIONAL MATCH (f:Person {id: line.FatherID})
WITH p, m, f
FOREACH (x IN CASE WHEN f IS NULL THEN [] ELSE [1] END | MERGE (p)-[:FATHER]->(f))
FOREACH (y IN CASE WHEN m IS NULL OR f IS NULL THEN [] ELSE [1] END | MERGE (m)-[:SPOUSE]->(f))

Here is what the result looks like, with your sample data:

Finding siblings

Here is how you'd find all the siblings of "Lil Bro":

MATCH (child:Person {name:'Lil Bro'})-[:MOTHER|FATHER]->()<-[:MOTHER|FATHER]-(sibling)
RETURN child, COLLECT(DISTINCT sibling)

Máté Thurzó · Answer 2 · 2016-05-04T18:07:05.920

This is far from efficient depending on your data set this will require tuning a lot, but for this basic dataset it's working:

LOAD CSV WITH HEADERS FROM
'file:///test.csv' AS line
WITH line
CREATE (p:Person {id: line.id})
SET p.name = line.name, p.motherId = line.motherId, p.fatherId = line.fatherId
WITH p
MATCH (p1:Person), (p2:Person)
WHERE p.fatherId = p1.id AND p.motherId = p2.id
MERGE (p1)-[:SPOUSE]->(p2);

The efficiency problem will come from the cartesian product in the MATCH part.

There is one more trick I've added, I've filled up the empty father and mother IDs with zeros.

query result screenshot

Updated

With children parent relationship:

LOAD CSV WITH HEADERS FROM
'file:///test.csv' AS line
WITH line
CREATE (p:Person {id: line.id})
SET p.name = line.name, p.motherId = line.motherId, p.fatherId = line.fatherId
WITH p
MATCH (p1:Person), (p2:Person)
WHERE p.fatherId = p1.id AND p.motherId = p2.id
MERGE (p1)-[:SPOUSE]->(p2)
WITH p
MATCH (father:Person)
WHERE p.fatherId = father.id
MERGE (p)-[:PARENT {type: 'FATHER'}]->(father)
WITH p
MATCH (mother:Person)
WHERE p.motherId = mother.id
MERGE (p)-[:PARENT {type: 'MOTHER'}]->(mother);

This is a really good start, thanks for coding this up. Do you know how we can denote that the children are related to the parents as well? — Monica Heddneck, May 04 '16 at 06:54

create and infer relationships based on ID when reading in a csv in neo4j

2 Answers2

Step 1: Generate all the Person nodes

Step 2: Generate all the relationships (FATHER, MOTHER, SPOUSE)

Finding siblings

Step 1: Generate all the `Person` nodes