Neo4j primary results lost after secondary match fails

Question

I have a match that returns results 'con'. I then want to try to refine the results by optionally intersecting it with other patterns. If their is an intersection on a secondary pattern then those results should be returned otherwise the original should be returned intact.

My problem is that when there is no intersection the 'con' results become null so I can't return them as an alternative. This is a constant battle for me in a lot of my queries. What is a good method for intersecting a collection without loosing it when there is no match?

MATCH p0=(:node {name:”Sam”})-[:has*1]->(s0:friend)-[:sub*0..35]->(con) 
MATCH p1=(:node {name:”Toby”})-[:has*1]->(s1:friend)-[:sub*0..35]->(con) 
OPTIONAL MATCH (s0)-[:inst*1]-(a:ins)-[:inst*1]->(b:ins)<-[:inst*1]-(s1) 
OPTIONAL MATCH (b)-[:inst|sub*0..40]->(c)
WITH apoc.coll.intersection(collect(distinct con),collect(distinct c)) as results,con
UNWIND results as co
RETURN DISTINCT 
CASE WHEN co IS NULL THEN con ELSE co END AS res

I've returned 'con' separately as a test and where there is a 'co' there is still a 'con' but when 'co' is null so is 'con'.

Thanks for your help!

What are you trying to do? Since your `WITH` clause uses `con` as the [aggregation](https://neo4j.com/docs/developer-manual/current/cypher/functions/aggregating/#query-functions-aggregating) grouping key, `collect(distinct con)` will always be a list containing just that single `con` node. Thus, `result` will either be an empty list (if there are no `c` nodes for that `con`, or if `con` is not one of the `c` nodes), or a list containing just that `con` node. So, even if you fixed the issue with `UNWIND` (as indicated by @InverseFalcon), you'd still end up with `res` always being just `con`. — cybersam, Jul 24 '18 at 09:06
What do you mean when saying `con` is the "aggregation grouping key"? In my queries `con` usually has many nodes. I need to filter them to refine the results. In this case I only want the `con` nodes that are also in `c`. `c` either has common nodes with `con` or is null. When `c` is null `res` should have all of con's nodes. When `c` and `con`have common nodes then `res` should be only those common nodes. The original example query was oversimplified i've updated it above. — Damon, Jul 24 '18 at 14:18
(My comments are based on your original query. Please only add new queries -- do not replace them, as that messes up all the answers and comments up to then.) [The link](https://neo4j.com/docs/developer-manual/current/cypher/functions/aggregating/#query-functions-aggregating) I provided describes how aggregation works, and discusses grouping keys. My point is that `res` would (after fixing the `UNWIND` issue) just always just be the individual `con` nodes. So, you'd get the same results if you replaced *everything* starting at the the first `OPTIONAL MATCH` clause with just `RETURN con`. — cybersam, Jul 24 '18 at 17:23
Thank you. The only solution I can come up with is to drop the `UNWIND` and return both `results` and `con` and then deal with them after. I'm replacing the last 3 lines of the query above with `WITH apoc.coll.intersection(collect(distinct con),collect(distinct c)) as results,con RETURN DISTINCT results,con` If anyone can see another way it would be greatly appreciated. — Damon, Jul 24 '18 at 20:18

score 1 · Answer 1 · answered Jul 24 '18 at 02:24

1

The problem here is your UNWIND. UNWIND is like taking the cross product of each list element with each row for the list, so if the list is empty, the row for that list will be wiped out.

We've added a section in the documentation on how to use CASE to use a non-empty list during UNWIND to get around this.

Applied to your query it would look like:

MATCH p0=(:node {name:"Sam"})-[:has*1]->(s0:friend)-[:sub*0..35]->(con) 
OPTIONAL MATCH (s0)-[:inst*1]-(a:ins)-[:inst*1]->(b:ins)<-[:inst*1]-(s1) 
OPTIONAL MATCH (b)-[:inst|sub*0..40]->(c)
WITH apoc.coll.intersection(collect(distinct con),collect(distinct c)) as results,con
UNWIND CASE WHEN size(results) = 0 THEN [null] ELSE results END as co
RETURN DISTINCT 
CASE WHEN co IS NULL THEN con ELSE co END AS res

answered Jul 24 '18 at 02:24

InverseFalcon

29,576
4
38
51

This is great, I looked for this for hours, just in all the wrong places. There is still something going wrong in my logic since i'm always getting con as results. – Damon Jul 24 '18 at 13:07
So what I don't understand is if I'm `unwinding` `results` why is that deleting `con` when `results` is an empty list?.. since `con` is not part of the `unwind` I expected it to be passed through to `return` untouched. – Damon Jul 24 '18 at 14:52
Keep in mind that records or rows are being passed between operations. Within that record/row is your list. An UNWIND of the list operates upon the row, doing the cross product of the entire row with each list element, so if the list is empty it wipes out the entire row. It may help if you do a `RETURN results, con LIMIT 5` before your UNWIND (comment out the rest) just to see an example of the row data before the UNWIND takes place. Any row with an empty `results` list would have been wiped out. – InverseFalcon Jul 25 '18 at 02:56
Thanks @InverseFalcon I think this helps get me over a mental block. I've been thinking of variables like stand alone arrays when in fact, if I now understand, they are more like tabular data with each variable being passed on as a column in the same table and each row is a cross product of all columns/variables... or I may still be in the dark haha! Thanks! – Damon Jul 25 '18 at 16:21
Glad to help, I think you've got a better idea of how it's handled. In truth Neo4j is producing records (consisting of variable names and values) which act as input/output between operations. It's sometimes easier to think of these records as rows with columns, especially if you're coming from a rdbms background. In any case the important thing to keep in mind, as you mentioned, is that the columns/variables are grouped per record/row. – InverseFalcon Jul 25 '18 at 16:30

Damon · Answer 2 · 2018-07-25T16:48:06.730

Through no intelligence of my own I have found a solution. In this use case c will always be a subset of con which means that if there are any results in c the intersection of c and con will always return results. This allows me to use the size of c to determine whether to return con or as a last operation preform apoc.coll.intersection() and return the intersection of con and c

MATCH p0=(:node {name:”Sam”})-[:has*1]->(s0:friend)-[:sub*0..35]->(con) 
MATCH p1=(:node {name:”Toby”})-[:has*1]->(s1:friend)-[:sub*0..35]->(con) 
OPTIONAL MATCH (s0)-[:inst*1]-(a:ins)-[:inst*1]->(b:ins)<-[:inst*1]-(s1) 
OPTIONAL MATCH (b)-[:inst|sub*0..40]->(c)
RETURN DISTINCT CASE WHEN size(collect(c)) = 0 THEN collect(con) ELSE apoc.coll.intersection(collect(distinct con),collect(distinct c)) END as co LIMIT 50

I had to collect(con) in the RETURN so that it is returned in the same form as co for post processing. There are likely simpler ways that I have not found by trial and error.

Neo4j primary results lost after secondary match fails

2 Answers2