Neo4j WHERE causes duplicates?

Question

I'm running Neo4j Desktop v1.4.1 the db is 4.2.1 enterprise.

I have a simple graph of placements, campaigns and a placement to campaign "contains" relationship. This is a fresh dataset, every node is unique. Some placements "contain" thousands of campaigns, so I want to filter the returned campaigns by an inclusion list of campaign ids.

When I return all the matched nodes it works:

neo4j@neo4j> MATCH (:Placement {id: 5})-[:CONTAINS]->(c:Campaign)
             WHERE c.id IN [400,263,150470,25810,37578]
             RETURN *;
+--------------------------+
| c                        |
+--------------------------+
| (:Campaign {id: 37578})  |
| (:Campaign {id: 263})    |
| (:Campaign {id: 25810})  |
| (:Campaign {id: 150470}) |
+--------------------------+

When I request just the campaign:id, I get duplicates:

neo4j@neo4j> MATCH (:Placement {id: 5})-[:CONTAINS]->(c:Campaign)
             WHERE c.id IN [400,263,150470,25810,37578]
             RETURN c.id;
+--------+
| c.id   |
+--------+
| 150470 |
| 150470 |
| 150470 |
| 150470 |
+--------+

There is only one CONTAINS relationship between placement 5 and campaign 15070:

neo4j@neo4j> MATCH (:Placement {id: 5})-[rel:CONTAINS]->(:Campaign {id:150470}) 
             RETURN count(rel);
+------------+
| count(rel) |
+------------+
| 1          |
+------------+

EXPLAIN returns the following query plan, the cache[c.id] seems like it might be the culprit?

+---------------------------+------------------------------------------------------------------------------------------------------+----------------+---------------------+
| Operator                  | Details                                                                                              | Estimated Rows | Other               |
+---------------------------+------------------------------------------------------------------------------------------------------+----------------+---------------------+
| +ProduceResults@neo4j     | `c.id`                                                                                               |              4 | Fused in Pipeline 1 |
| |                         +------------------------------------------------------------------------------------------------------+----------------+---------------------+
| +Projection@neo4j         | cache[c.id] AS `c.id`                                                                                |              4 | Fused in Pipeline 1 |
| |                         +------------------------------------------------------------------------------------------------------+----------------+---------------------+
| +Expand(Into)@neo4j       | (anon_7)-[anon_27:CONTAINS]->(c)                                                                     |              4 | Fused in Pipeline 1 |
| |                         +------------------------------------------------------------------------------------------------------+----------------+---------------------+
| +MultiNodeIndexSeek@neo4j | UNIQUE anon_7:Placement(id) WHERE id = $autoint_0, cache[c.id], UNIQUE c:Campaign(id) WHERE id IN $a |             25 | In Pipeline 0       |
|                           | utolist_1, cache[c.id]                                                                               |                |                     |
+---------------------------+------------------------------------------------------------------------------------------------------+----------------+---------------------+

Edit: if I prepend the query with CYPHER runtime=SLOTTED I get the expected output:

+--------+
| c.id   |
+--------+
| 37578  |
| 263    |
| 25810  |
| 150470 |
+--------+

If I omit the WHERE clause I get unique campaign ids (but too many). I feel like I'm missing something obvious, but I've read the neo4j docs and I'm not getting it. Thanks!

Is it possible that the Placement with id 5 has multiple relationships of type CONTAINS to campaign id 150470? — Luanne, Feb 27 '21 at 03:44
Very strange, any chance you can share a script to recreate your graph? And which version of Neo4j is this? — Luanne, Feb 28 '21 at 11:25
Can you let us know what version of Neo4j you are running? Also, can you let us know if you get the same duplicate results if you prefix your query with `CYPHER runtime=SLOTTED ` ? — InverseFalcon, Mar 02 '21 at 01:17
@InverseFalcon thanks, `CYPHER runtime=SLOTTED ` returns the expected results! — David Farrell, Mar 02 '21 at 15:01
@DavidFarrell Can you let us know the version of Neo4j you're using? Since SLOTTED runtime avoids the issue, this must be a bug in PIPELINED runtime, but depending on the version you are using, it might have already been found and fixed in a later patch. — InverseFalcon, Mar 02 '21 at 20:22
Thanks, that gives us the Desktop version, but that's separate from the Neo4j version. Could you check the version of the Neo4j database you're running within Desktop? — InverseFalcon, Mar 03 '21 at 23:40
@DavidFarrell Thanks. Any chance you could test with 4.2.3, the latest patch? It is possible the bug behind this may have been fixed. — InverseFalcon, Mar 05 '21 at 04:11

Neo4j WHERE causes duplicates?

0 Answers0