0

I create 3 test nodes with name properties are "a", "b", "c", and use apoc.coll.zip() to combine two lists together:

MATCH (n:test) 
WITH collect(n.name) as nodes 
WITH apoc.coll.zip(nodes, range(0, size(nodes))) as pairs 
RETURN pairs;

+--------------------------------+
| pairs                          |
+--------------------------------+
| [["a", 0], ["b", 1], ["c", 2]] |
+--------------------------------+

The result is expected. What is interesting is when I modify the query either by adding another column in the RETURN clause or by UNWINDing the pair.

1. RETURN pairs,n.name;

MATCH (n:test)
WITH n, collect(n.name) as nodes
WITH n, apoc.coll.zip(nodes, range(0, size(nodes))) as pairs 
RETURN pairs,n.name;
+---------------------+
| pairs      | n.name |
+---------------------+
| [["a", 0]] | "a"    |
| [["b", 0]] | "b"    |
| [["c", 0]] | "c"    |
+---------------------+

I expect the result to be exactly the same with the query:

MATCH (n:test) 
WITH n, [["a", 0], ["b", 1], ["c", 2]] as nested 
RETURN nested, n.name;

+---------------------––––––––––––––––––--+
| pairs                          | n.name |
+---------------------––––––––––––––––––--+
| [["a", 0], ["b", 1], ["c", 2]] | "a"    |
| [["a", 0], ["b", 1], ["c", 2]] | "b"    |
| [["a", 0], ["b", 1], ["c", 2]] | "c"    |
+---------------------––––––––––––––––––--+

2. UNWIND pairs as pair RETURN pairs

MATCH (n:test)
WITH n, collect(n.name) as nodes
WITH n, apoc.coll.zip(nodes, range(0, size(nodes))) as pairs
UNWIND pairs as pair
RETURN pairs;

+------------+
| pairs      |
+------------+
| [["a", 0]] |
| [["b", 0]] |
| [["c", 0]] |
+------------+

I expect the result to be no different than having no UNWIND clause:

+--------------------------------+
| pairs                          |
+--------------------------------+
| [["a", 0], ["b", 1], ["c", 2]] |
+--------------------------------+

3. UNWIND pairs as pair RETURN pair

MATCH (n:test)
WITH n, collect(n.name) as nodes
WITH n, apoc.coll.zip(nodes, range(0, size(nodes))) as pairs
UNWIND pairs as pair
RETURN pair;
+----------+
| pair     |
+----------+
| ["a", 0] |
| ["b", 0] |
| ["c", 0] |
+----------+

I expect the result to be no different than simply UNWIND a nested list:

UNWIND [["a", 0], ["b", 1], ["c", 2]] as list 
RETURN list;

+----------+
| list     |
+----------+
| ["a", 0] |
| ["b", 1] |
| ["c", 2] |
+----------+

Do you know why these happen? They don't seem to be explained in RETURN and UNWIND documentation.

Ooker
  • 1,969
  • 4
  • 28
  • 58

1 Answers1

1

For all the 3 queries listed, the key point is at,

...
WITH n, collect(n.name) as nodes
...

collect is an aggregating function and it can be grouped. Specifying 'n' in the WITH clause causes it to do "group by" similar to SQL grouping. Therefore if you have 3 nodes, you get 3 results.

You can debug by RETURNing after the WITH to see the result at each step, like so,

MATCH (n:test)
WITH n, collect(n.name) as nodes
RETURN n, nodes
aldrin
  • 4,482
  • 1
  • 33
  • 50
  • hmm. So the way to fix this is to have a dedicated `WITH` clause for `n`? Once there is a `WITH` clause for a variable, will it be brought forward, even when other `WITH` clauses don't have it? – Ooker Oct 27 '21 at 11:08
  • one simplistic way to look at it is to imagine each 'WITH` clause as a stopping point in the execution where the intermediate results are collected and then sent as input to the next part of the cypher statement – aldrin Oct 27 '21 at 11:22
  • yes, but when the next part has another `WITH` clause and we don't remind the variable again,, then would it be forgotten again? – Ooker Oct 27 '21 at 11:48
  • yes thats right – aldrin Oct 27 '21 at 12:17
  • but if so, then how can the fix work? `n` will be forgotten right after `WITH collect(n.name) as nodes` – Ooker Oct 27 '21 at 15:16
  • yes it will be forgotten, you have to carry over 'n' if you want to refer to it later. or pull out the properties that you need from `n` and include with the `WITH` clause – aldrin Oct 28 '21 at 10:52