0

Continuing with the following question Neo4j Cypher manual relationship index, APOC trigger and data duplication I have created the scenario which reproduces the issue:

CALL apoc.trigger.add('TEST_TRIGGER', "UNWIND keys({assignedRelationshipProperties}) AS key 
UNWIND {assignedRelationshipProperties}[key] AS map 
WITH map 
WHERE type(map.relationship) = 'LIVES_IN' 
CALL apoc.index.addRelationship(map.relationship, keys(map.relationship)) 
RETURN count(*)", {phase:'before'})

CREATE (p:Person) SET p.id = 1 return p
CREATE (p:Person) SET p.id = 2 return p
CREATE (c:City) return c

MATCH (p:Person), (c:City) WHERE p.id = 1 CREATE (p)-[r:LIVES_IN]->(c) SET r.time = 10 RETURN type(r)
MATCH (p:Person), (c:City) WHERE p.id = 2 CREATE (p)-[r:LIVES_IN]->(c) SET r.time = 20 RETURN type(r)

Now, let's try to select the person with r.time = 10:

MATCH (p:Person)-[r:LIVES_IN]->(c:City) 
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person 
RETURN person

The query above correctly returns only one node.

Now, let's do the same but return the person count:

MATCH (p:Person)-[r:LIVES_IN]->(c:City) 
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person 
RETURN count(person)

The query above returns count = 2.

Why this query returns count = 2 instead of the single node?

Also, the following query:

MATCH (p:Person)-[r:LIVES_IN]->(c:City) 
CALL apoc.index.relationships('LIVES_IN', 'time:10') YIELD rel
RETURN rel

returns 2 relationships:

{
  "time": 10
}
{
  "time": 10
}

but I expect only the single one in the manual index where time = 10.

What am I doing wrong ?

alexanoid
  • 24,051
  • 54
  • 210
  • 410

2 Answers2

1

The first query in your example returns also two lines. Apparently you look at the result in the form of a graph. Try or switch to table mode, or change the query for this:

MATCH (p:Person)-[r:LIVES_IN]->(c:City) 
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person 
RETURN ID(person)

Two lines are obtained because you have two people living in the same city and for each relationship you do a search on the index. Try this:

MATCH (p:Person)-[r:LIVES_IN]->(c:City)
WITH DISTINCT c
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person 
RETURN COUNT(DISTINCT person)
stdob--
  • 28,222
  • 5
  • 58
  • 73
  • Thanks for your answer. Yes, I have two people living in the same city but only one of them with time=10 so I don’t understand why 2 results are returned instead a single one. Could you please comment on this? – alexanoid May 24 '18 at 22:26
  • Because you first get a city for each person, and then you do an index search for the same city as many times as there are persons living in this city. – stdob-- May 24 '18 at 22:31
  • Well.. how about the last query in my question with apoc.index.relationships ? I don’t use city there. Why it also returns 2 same relationships instead a single one? – alexanoid May 24 '18 at 22:37
  • 1
    Because the search for the index after this pattern `MATCH (p:Person)-[r:LIVES_IN]->(c:City)` is called twice - think of it as a loop. And neo4j does not automatically remove duplicates from the result. – stdob-- May 24 '18 at 22:40
  • Thanks. The core issue - I’d like to replace filtering based on pure relationship properties, for example WHERE r.time=10 with manual indexes. So, is it a good idea to use DISTINCT there with apoc.index.* queries in case I’ll have tens or hundreds of thousands nodes? – alexanoid May 24 '18 at 22:46
  • DISTINCT still need to be applied. Look, by the way, I made changes to the second query - the distinct is applied twice. – stdob-- May 24 '18 at 22:53
1

Just your query example is wrong, use this:

CALL apoc.index.relationships('LIVES_IN', 'time:10') YIELD rel
RETURN rel

or this

MATCH (c:City) 
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person 
RETURN count(person)
Michael Hunger
  • 41,339
  • 3
  • 57
  • 80