0

This question is a direct extension of a question I asked previously here (and and even earlier version here).

Say I have a graph database that looks like this:

enter image description here

Just like the previous questions I asked, the only really interesting thing about this is that SomeProperty can be 'Yes' or 'No'.

In the top row, 1 of the three nodes has a 'Yes' for this property.

On the bottom row, 3 nodes of the five nodes have a 'Yes' for this property.

(Slight philosophical sidenote: I'm starting to suspect that this is a bad graph schema. Why? Because, within each set of nodes, each node is in connected to every other node. I'm not worried about the fact that there are two groups of nodes, but the fact that when I populate this graph, I get talkback that says, 'Returned 530 rows.' I think this means actually created 530 subpaths within the graph structure and this seems like overkill.)

Anyway, the problem I'm trying to solve is pretty much the same as the problem I was trying to solve in the earlier, simpler, more linear context here.

I want to return the full path of either of these disjoint graphs, whereas anywhere within said graph the count the occurrences of SomeProperty is greater than 2.

I would think this is a common, simple problem. For example, say you had two unrelated families, and someone says, "Show me with family has more than 2 left handed people."

The super smart #cybersam recommended for the simpler incarnation of this problem, something along the lines of:

MATCH p=(a:person)-[:RELATED_TO*]->(b:person)
WHERE
  NOT ()-[:RELATED_TO]->(a) AND
  NOT (b)-[:RELATED_TO]->() AND
  2 < REDUCE(s = 0, x IN NODES(p) | CASE WHEN x. SomeProperty = 'Yes' THEN s + 1 ELSE s END)
RETURN p;

...which works great if the graph resembles more of a straight line, and doesn't have each node in the set related to each other node.

I think the reason why #cybersam's query won't handle this more complex graph is because there is no terminal node.

(Another philosophical sidenote: I'm starting to come up with a theories that dense, intricate relationships in a graph pose combinatorial problems, with performance as well as querying. I think this might be due to the bidirectionality used by Cypher when querying?)

Here's my data. Any advice is appreciate and thanks for helping me climb the learning curve.

// match (n) detach delete n;

CREATE (albert:person {gender: 'Male', name: 'Albert', SomeProperty: 'Yes'})
CREATE (annie:person {gender: 'Female', name: 'Annie', SomeProperty: 'No'})
CREATE (adrian:person {gender: 'Female', name: 'Adrian', SomeProperty: 'No'})

CREATE (albert)-[:RELATED_TO]->(annie)
CREATE (annie)-[:RELATED_TO]->(albert)
CREATE (annie)-[:RELATED_TO]->(adrian)
CREATE (adrian)-[:RELATED_TO]->(annie)
CREATE (albert)-[:RELATED_TO]->(adrian)
CREATE (adrian)-[:RELATED_TO]->(albert)


CREATE (bill:person {gender: 'Male', name: 'Bill', SomeProperty: 'Yes'})
CREATE (barb:person {gender: 'Female', name: 'Barb', SomeProperty: 'Yes'})
CREATE (barry:person {gender: 'Male', name: 'Barry', SomeProperty: 'Yes'})
CREATE (bart:person {gender: 'Male', name: 'Bart', SomeProperty: 'No'})
CREATE (bartholemu:person {gender: 'Male', name: 'Bartholemu', SomeProperty: 'No'})

CREATE (bill)-[:RELATED_TO]->(barb)
CREATE (barb)-[:RELATED_TO]->(bill)
CREATE (barb)-[:RELATED_TO]->(barry)
CREATE (barry)-[:RELATED_TO]->(barb)
CREATE (barry)-[:RELATED_TO]->(bart)
CREATE (bart)-[:RELATED_TO]->(barry)
CREATE (bart)-[:RELATED_TO]->(bartholemu)
CREATE (bartholemu)-[:RELATED_TO]->(bart)
CREATE (bill)-[:RELATED_TO]->(bartholemu)
CREATE (bartholemu)-[:RELATED_TO]->(bill)
Community
  • 1
  • 1
Monica Heddneck
  • 2,973
  • 10
  • 55
  • 89
  • Rather than try and wrestle with another variation of your question , can you tell us what the problem is you're trying to solve instead of the solution you think you need? – Tim Kuehn Apr 21 '16 at 20:48
  • there is no terminal node... which is what you asked for in your prior question. If you get rid of the WHERE / NOT conditions the terminal node filtering goes away. – Tim Kuehn Apr 21 '16 at 20:50

1 Answers1

1

If this is about families of people, then easiest fix is to add a :Family node for each relational group, like so:

create (f:Family) with f 
match (a:person {name:"Adrian"})-[:RELATED_TO*]->(b:person)  
merge (f:Family)<-[:FAMILY]-(a) 
merge (f:Family)<-[:FAMILY]-(b)

Replace "Adrian" with "Barry" to create the second family group.

That gives you a central :Family node for each family group. You can then pick the family group that has enough :person.SomeProperty = "Yes" family members like so:

// Find families with 2 or more :person.SomeProperty = "yes"
match p = (f:Family)<-[:FAMILY]-(psn:person)
where psn.SomeProperty = "Yes"
with  f, count(psn) as cnt 
where cnt > 2

// Get the family members 
match (a:person)<-[r1:RELATED_TO]-(b:person)-[r2:RELATED_TO*]->(c)
where (a)-[:FAMILY]-(f)
  and a = c  // to get all the nodes in the loop 

// report the first record which'll have two  
// family members and all the relationships
return a, r1, b, r2 
limit 1
Tim Kuehn
  • 3,201
  • 1
  • 17
  • 23
  • Thank you for this! It almost works perfectly. It's a creative idea that I didn't even think of -- adding a new node to each family to help show that they are isolated. The only issue is that one of the new nodes is not being built, and the query runs forever...http://stackoverflow.com/questions/36781902/query-performance-when-adding-a-new-node-in-neo4j – Monica Heddneck Apr 21 '16 at 23:02
  • Welcome! If this works then please mark it as the answer and give it an upvote. :) – Tim Kuehn Apr 21 '16 at 23:09