0

I am wondering why my Cypher query is taking an exorbitant amount of time.

Basically, I have a small family tree (two families), and I'm trying to add to each one a new node that carries some small bit of metadata so that the families are easier to keep isolated from each other when they are queried. (Thanks to @Tim Kuehn for this advice).

Once I run the query to populate my two families, I have this, which is built quickly with no problems:

enter image description here

Next, I want to create the aforementioned new nodes. The first node is created quickly, applied to the smaller family (I call them family B):

// 'add a :Family node for each relational group, like so:'

CREATE (famB:Family) 
WITH famB
MATCH (a:Person {name:"Gramps Johnson"})-[:RELATED_TO*]->(b:Person)  
MERGE (famB:Family)<-[:FAMILY]-(a) 
MERGE (famB:Family)<-[:FAMILY]-(b) 

...which gives me this. So far so good!

enter image description here

Moving forward, however, the slightly larger family's node is never created for some reason. The code is the same, but the query just runs and runs...

enter image description here

// 'add a :Family node for each relational group, like so:'

CREATE (famA:Family) 
WITH famA
MATCH (a:Person {name:"Gramps Doe"})-[:RELATED_TO*]->(b:Person)  
MERGE (famA:Family)<-[:FAMILY]-(a) 
MERGE (famA:Family)<-[:FAMILY]-(b)

Why would this happen?

My first idea was to put an index on the name property:

// put index' on the name properties of the nodes:
// CREATE INDEX ON :Person(name)  

but that didn't do anything.

So I tried to look at the EXPLAIN but it didn't really tell me anything. (It also runs forever on the terminal itself when executed.)

enter image description here

Thanks for your help.

Here's my code to create the graph:

// FAMILY A2: create grandparents, their son.

CREATE (grampsdoe:Person {name: 'Gramps Doe', id:'1', Gender:'Male', Diagnosis: 'Alzheimers', `Is Alive?`: 'No', Handedness: 'Left', `Risk Score`: 'PURPLE'})
CREATE (gramsdoe:Person {name: 'Grams Doe', id:'2', Gender:'Female', Diagnosis: 'Alzheimers', `Is Alive?`: 'No', Handedness: 'Right', `Risk Score`: 'GIRAFFE'})
CREATE (daddoe:Person {name: 'Dad Doe', id:'3', Gender:'Male', Diagnosis: 'MCI', `Is Alive?`: 'No', Handedness: 'Right', `Risk Score`: 'GIRAFFE'})

CREATE
(grampsdoe)-[:RELATED_TO {relationship: 'Husband'}]->(gramsdoe),
(gramsdoe)-[:RELATED_TO {relationship: 'Wife'}]->(grampsdoe),
(grampsdoe)-[:RELATED_TO {relationship: 'Father'}]->(daddoe),
(gramsdoe)-[:RELATED_TO {relationship: 'Mother'}]->(daddoe),
(daddoe)-[:RELATED_TO {relationship: 'Son'}]->(grampsdoe),
(daddoe)-[:RELATED_TO {relationship: 'Son'}]->(gramsdoe)


// FAMILY A2: create grandparents, their daughter

CREATE (grampssmith:Person {name: 'Gramps Smith', id:'4', Gender:'Male', Diagnosis: 'Normal', `Is Alive?`: 'No', Handedness: 'Left', `Risk Score`: 'PURPLE'})
CREATE (gramssmith:Person {name: 'Grams Smith', id:'5', Gender:'Female', Diagnosis: 'Alzheimers', `Is Alive?`: 'No', Handedness: 'Ambidextrous', `Risk Score`: 'PURPLE'})
CREATE (momsmith:Person {name: 'Mom Doe', id:'6', Gender:'Female', Diagnosis: 'Alzheimers', `Is Alive?`: 'No', Handedness: 'Right', `Risk Score`: 'GIRAFFE'})

CREATE
(grampssmith)-[:RELATED_TO {relationship: 'Husband'}]->(gramssmith),
(gramssmith)-[:RELATED_TO {relationship: 'Wife'}]->(grampssmith),
(grampssmith)-[:RELATED_TO {relationship: 'Father'}]->(momsmith),
(gramssmith)-[:RELATED_TO {relationship: 'Mother'}]->(momsmith),
(momsmith)-[:RELATED_TO {relationship: 'Daughter'}]->(grampssmith),
(momsmith)-[:RELATED_TO {relationship: 'Daughter'}]->(gramssmith)


// FAMILY A3: 'Dad Doe' and 'Mom Smith' get married and have 2 kids who are twins
CREATE (lilbro:Person {name: 'Lil Bro', id:'7', Gender:'Male', Diagnosis: 'Normal', `Is Alive?`: 'Yes', Handedness: 'Right', `Risk Score`: 'PURPLE'})
CREATE (bigsis:Person {name: 'Big Sis', id:'8', Gender:'Female', Diagnosis: 'Normal', `Is Alive?`: 'Yes', Handedness: 'Right', `Risk Score`: 'PURPLE'})

CREATE (daddoe)-[:RELATED_TO {relationship: 'Husband'}]->(momsmith)
CREATE (momsmith)-[:RELATED_TO {relationship: 'Wife'}]->(daddoe) 

CREATE (lilbro)-[:RELATED_TO {relationship: 'Brother'}]->(bigsis)

CREATE
(lilbro)-[:RELATED_TO {relationship: 'Grandson'}]->(grampsdoe),
(grampsdoe)-[:RELATED_TO {relationship: 'Grandfather'}]->(lilbro),
(lilbro)-[:RELATED_TO {relationship: 'Grandson'}]->(grampssmith),
(grampssmith)-[:RELATED_TO {relationship: 'Grandfather'}]->(lilbro),

(lilbro)-[:RELATED_TO {relationship: 'Grandson'}]->(grampssmith),
(grampssmith)-[:RELATED_TO {relationship: 'Grandmother'}]->(lilbro),
(lilbro)-[:RELATED_TO {relationship: 'Grandson'}]->(gramssmith),
(gramssmith)-[:RELATED_TO {relationship: 'Grandmother'}]->(lilbro),


(lilbro)-[:RELATED_TO {relationship: 'Son'}]->(daddoe),
(daddoe)-[:RELATED_TO {relationship: 'Father'}]->(lilbro),
(lilbro)-[:RELATED_TO {relationship: 'Son'}]->(momsmith),
(momsmith)-[:RELATED_TO {relationship: 'Mother'}]->(lilbro),

(bigsis)-[:RELATED_TO {relationship: 'Sister'}]->(lilbro),

(bigsis)-[:RELATED_TO {relationship: 'Granddaughter'}]->(grampsdoe),
(grampsdoe)-[:RELATED_TO {relationship: 'Grandfather'}]->(bigsis),
(bigsis)-[:RELATED_TO {relationship: 'Granddaughter'}]->(grampssmith),
(grampssmith)-[:RELATED_TO {relationship: 'Grandfather'}]->(bigsis),

(bigsis)-[:RELATED_TO {relationship: 'Granddaughter'}]->(gramsdoe),
(gramsdoe)-[:RELATED_TO {relationship: 'Grandmother'}]->(bigsis),
(bigsis)-[:RELATED_TO {relationship: 'Granddaughter'}]->(gramssmith),
(gramssmith)-[:RELATED_TO {relationship: 'Grandfather'}]->(bigsis),


(bigsis)-[:RELATED_TO {relationship: 'Daughter'}]->(daddoe),
(daddoe)-[:RELATED_TO {relationship: 'Father'}]->(bigsis),
(bigsis)-[:RELATED_TO {relationship: 'Daughter'}]->(momsmith),
(momsmith)-[:RELATED_TO {relationship: 'Mother'}]->(bigsis)



// FAMILY B1: create grandparents, their son.

CREATE (grampsjohnson:Person {name: 'Gramps Johnson', id:'9', Gender:'Male', Diagnosis: 'Normal', `Is Alive?`: 'No', Handedness: 'Right', `Risk Score`: 'GIRAFFE'})
CREATE (gramsjohnson:Person {name: 'Grams Johnson', id:'10', Gender:'Female', Diagnosis: 'Normal', `Is Alive?`: 'No', Handedness: 'Right', `Risk Score`: 'GIRAFFE'})
CREATE (johnjohnson:Person {name: 'John Johnson', id:'11', Gender:'Male', Diagnosis: 'MCI', `Is Alive?`: 'Yes', Handedness: 'Right', `Risk Score`: 'GIRAFFE'})

CREATE
(grampsjohnson)-[:RELATED_TO {relationship: 'Husband'}]->(gramsjohnson),
(gramsjohnson)-[:RELATED_TO {relationship: 'Wife'}]->(grampsjohnson),
(grampsjohnson)-[:RELATED_TO {relationship: 'Father'}]->(johnjohnson),
(gramsjohnson)-[:RELATED_TO {relationship: 'Mother'}]->(johnjohnson),
(johnjohnson)-[:RELATED_TO {relationship: 'Son'}]->(grampsjohnson),
(johnjohnson)-[:RELATED_TO {relationship: 'Son'}]->(gramsjohnson)
Community
  • 1
  • 1
Monica Heddneck
  • 2,973
  • 10
  • 55
  • 89

1 Answers1

2

Why would this happen?

The reason that was happening is the second family wasn't a loop any more, it was a "everyone connected twice to everyone". That meant this part of the "make a family node" code:

MATCH (a:Person {name:"Gramps Doe"})-[:RELATED_TO*]->(b:Person)  

was tracing a huge number of graphs, and the system was stalling as a result.

Since there's 8 nodes in the target group, I restricted the pathing to a range of 1 to 8 hops ([:RELATED_TO*1..8]) -

CREATE (famA:Family) 
WITH famA
MATCH (a:Person {name:"Gramps Doe"})-[:RELATED_TO*1..8]->(b:Person)  
MERGE (famA:Family)<-[:FAMILY]-(a) 
MERGE (famA:Family)<-[:FAMILY]-(b)

and that ran to completion.

To get the entire family where a disease has shown up a certain number of times:

// count the family members with a disease
MATCH (f:Family)<-[:FAMILY]-(person:Person) 
WHERE person.Diagnosis = "Alzheimers" 
WITH f, count(person) AS Count 
WHERE Count > 2 

// Then report the family members as a single collection
MATCH (a:Person)-[r1:FAMILY]-(f)
RETURN collect(DISTINCT a)
Tim Kuehn
  • 3,201
  • 1
  • 17
  • 23
  • ...genius. So the first family that was being created correctly was still an "everyone connected twice to everyone" graph as well...but the number of nodes was small enough? I'm starting to think that this 'double connected' graph is overkill. THANKS!! – Monica Heddneck Apr 21 '16 at 23:28
  • Correct - there weren't that many paths in the smaller family group so it was able to complete in a reasonable period of time. If you're going for a different structure, I'd suggest parent one way and child the other. This would allow you to trace family relationships and not get caught up in these spider webs of connections. – Tim Kuehn Apr 21 '16 at 23:30
  • Yes, there are too many connections. Even with the fix it took 4 seconds for 9 nodes. I'd be worried about scaling it to something as small as 1,000 nodes (esp if those 1,000 nodes all were separated into, say 200 families). – Monica Heddneck Apr 21 '16 at 23:32
  • If this is for commercial use, I can help with that. – Tim Kuehn Apr 21 '16 at 23:36
  • 1
    Although this solution is very nice, it does not always work if some neighboring nodes are not bidirectionally-linked. For example, suppose some new node has the `x` identifier, and `(x)-[:RELATED_TO]->(a)` is in the DB, but the reverse relationship is not. Although you can use an undirected relationship pattern in the `MATCH`, that will make it very likely the query will run forever. This seems like a very difficult problem in general, and may need new Cypher support to address in a satisfactory way. – cybersam Apr 22 '16 at 00:20
  • unrelated to what cybersam said, I just ran this: MATCH p = (f:Family)<-[:FAMILY]-(person:Person) WHERE person.Diagnosis = "Alzheimers" WITH f, count(person) AS Count WHERE Count > 2 MATCH (a:Person)<-[r1:RELATED_TO]-(b:Person)-[r2:RELATED_TO*]-> (c) WHERE (a )-[:FAMILY]-(f) AND a = c RETURN a, r1, b, r2 limit 1 and noticed that 'Gramps Smith' was missing from the full family tree result. – Monica Heddneck Apr 22 '16 at 00:34
  • 1
    @cybersam - something that'd be cool would be a "get the graph of everything connected to this node and its connections" w/out the system mapping every single path in the cluster. – Tim Kuehn Apr 22 '16 at 02:02
  • @MonicaHeddneck - I've posted code that does what you're looking for. – Tim Kuehn Apr 22 '16 at 03:00
  • 1
    @TimKuehn: "get the graph of everything connected to this node and its connections" w/out the system mapping every single path in the cluster." In my naivete, I thought this would be a simple thing to do -- now I realize that cypher will search everywhere. – Monica Heddneck Apr 22 '16 at 04:55
  • Using the "family" node solves the problem, I'd like a solution that didn't require that. BTW - if this answered your question, can you mark it as the answer and upvote it? Thx! – Tim Kuehn Apr 22 '16 at 11:56
  • 1
    @TimKuehn: I started reading the book 'Learning Neo4j' (the Pakt book), and it discussed Labels: '(with labels) there is no longer a need to work with a type property on the nodes, or a need to connect nodes to definition nodes that provide meta-information about the graph.' I wonder if Labels could also somehow solve my problem. Anyway, thanks, and yes, answer accepted. – Monica Heddneck Apr 22 '16 at 17:57
  • Labels are ways of saying "These nodes have something in common", and they're also needed when creating indexes to make lookups go faster. I think your issue is more of an organizational / how do I connect the dots kind of question. – Tim Kuehn Apr 22 '16 at 19:21