How to optimize this cypher query to get faster Result

Question

the-file-name.csv is on the previous question How to make the Cypher request to handle those both cases without duplicating Nodes

On the first step I do

CREATE CONSTRAINT ON (r:Region) ASSERT r.region IS UNIQUE;
CREATE CONSTRAINT ON (c:City) ASSERT c.cityName IS UNIQUE;
CREATE CONSTRAINT ON (s:Sector) ASSERT s.sectorName IS UNIQUE;

Here I put the cypher query.

LOAD CSV WITH HEADERS FROM "file:///the-file-name.csv" as line 
FIELDTERMINATOR ','  
with line as line
OPTIONAL MATCH (n:Region) 
WHERE  n.region contains "BLANKEMPTYVIDE" AND n.identifier= line.CODE_TER 
WITH n, line
CALL apoc.do.when(
   n IS NULL,
  'MERGE (r:Region{region: line.TERRITOIRE}) ON CREATE SET r.description=line.TERRITOIRE ON CREATE SET r.identifier=line.CODE_TER ON CREATE SET r.id = toString(id(r)) RETURN r',
  'RETURN n AS r',
  {n: n, line: line}) YIELD value
WITH value.r AS r, line 
SET r.identifier=line.CODE_TER
SET r.description=line.TERRITOIRE
SET r.region=line.TERRITOIRE
WITH r,line
OPTIONAL MATCH (ci:City) where ci.cityName contains "BLANKEMPTYVIDE" AND ci.regionIdentifier= line.CODE_TER
CALL apoc.do.when(
   ci IS NULL,
  'MERGE (c:City {cityName:line.BRICK}) ON CREATE SET c.identifier=line.CODE_BRICK ON CREATE SET c.region=line.TERRITOIRE ON CREATE SET c.regionIdentifier=line.CODE_TER ON CREATE SET c.zip=line.CODE_BRICK ON CREATE SET c.description=line.BRICK ON CREATE SET c.id = toString(id(c)) RETURN c',
  'RETURN ci AS c',
  {ci: ci, line: line}) YIELD value
WITH value.c AS c,line,r
SET c.identifier=line.CODE_BRICK 
SET c.region=line.TERRITOIRE
SET c.regionIdentifier=line.CODE_TER
SET c.zip=line.CODE_BRICK
SET c.description=line.BRICK
SET c.cityName=line.BRICK  
WITH c,r,line
MATCH (c {identifier:line.CODE_BRICK}),(r {identifier:line.CODE_TER})
MERGE (c)-[:IS_A_City_BELONGING_TO]->(r)
WITH c,r,line
OPTIONAL MATCH (sec:Sector) 
WHERE  sec.description contains "BLANKEMPTYVIDE" AND sec.regionIdentifier=line.CODE_TER 
CALL apoc.do.when(
   sec IS NULL,
  'MERGE (s:Sector {sectorName:line.SOUSBRICK}) ON CREATE SET s.identifier=line.CODE_SBRICK ON CREATE SET s.region=line.TERRITOIRE ON CREATE SET s.regionIdentifier=line.CODE_TER ON CREATE SET s.city=line.BRICK ON CREATE SET s.cityIdentifier=line.CODE_BRICK ON CREATE SET s.description=line.SOUSBRICK ON CREATE SET s.zip=line.SOUSBRICK ON CREATE SET s.id = toString(id(s)) RETURN s',
  'RETURN sec AS s',
  {sec: sec, line: line}) YIELD value
WITH value.s AS s,line,c,r
SET s.identifier=line.CODE_SBRICK 
SET s.region=line.TERRITOIRE 
SET s.regionIdentifier=line.CODE_TER 
SET s.city=line.BRICK 
SET s.cityIdentifier=line.CODE_BRICK 
SET s.description=line.SOUSBRICK 
SET s.zip=line.SOUSBRICK 
SET s.sectorName = line.SOUSBRICK   
WITH s,c,r,line
MATCH (s {identifier:line.CODE_SBRICK}),(r{identifier:line.CODE_TER}) 
MERGE (s)-[:IS_A_SECTOR_BELONGING_TO_THAT_REGION]->(r) 
WITH s,c,r,line
MATCH (s {identifier:line.CODE_SBRICK}),(c{identifier:line.CODE_BRICK})
MERGE (s)-[:IS_A_SECTOR_BELONGING_TO_THAT_CITY]->(c)

The result :

Set 28271 properties, created 3400 relationships, completed after 34985 ms.

It took nearby 35 sec to load csv to neo4j db. please is it right perfect in performance view ?

for a csv file containing 1665 csv lines each line contains 6 fields : two first fields for (Region) the third and fourth (City) and the fifth and sixth for (Sector).

How to optimize this neo4j query so that it takes just few seconds below 3 seconds ?

Here below I put the profiler result picture.

Here I made the update upon advice from InverseFalcon

LOAD CSV WITH HEADERS FROM "file:///the-file-name.csv" as line 
FIELDTERMINATOR ','  
with line as line
OPTIONAL MATCH (n:Region) 
WHERE  n.region contains "BLANKEMPTYVIDE" AND n.identifier= line.CODE_TER 
WITH n, line
CALL apoc.do.when(
   n IS NULL,
  'MERGE (r:Region{region: line.TERRITOIRE}) ON CREATE SET r.description=line.TERRITOIRE ON CREATE SET r.identifier=line.CODE_TER ON CREATE SET r.id = toString(id(r)) RETURN r',
  'SET n.identifier=line.CODE_TER  SET n.description=line.TERRITOIRE  SET n.region=line.TERRITOIRE RETURN n AS r',
  {n: n, line: line}) YIELD value
WITH value.r AS r, line 

OPTIONAL MATCH (ci:City) where ci.cityName contains "BLANKEMPTYVIDE" AND ci.regionIdentifier= line.CODE_TER
CALL apoc.do.when(
   ci IS NULL,
  'MERGE (c:City {cityName:line.BRICK}) ON CREATE SET c.identifier=line.CODE_BRICK ON CREATE SET c.region=line.TERRITOIRE ON CREATE SET c.regionIdentifier=line.CODE_TER ON CREATE SET c.zip=line.CODE_BRICK ON CREATE SET c.description=line.BRICK ON CREATE SET c.id = toString(id(c)) RETURN c',
  'SET ci.identifier=line.CODE_BRICK SET ci.region=line.TERRITOIRE SET ci.regionIdentifier=line.CODE_TER SET ci.zip=line.CODE_BRICK SET ci.description=line.BRICK SET ci.cityName=line.BRICK  RETURN ci AS c',
  {ci: ci, line: line}) YIELD value
WITH value.c AS c,line,r
MERGE (c)-[:IS_A_City_BELONGING_TO]->(r)
WITH c,r,line
OPTIONAL MATCH (sec:Sector) 
WHERE  sec.description contains "BLANKEMPTYVIDE" AND sec.regionIdentifier=line.CODE_TER 
CALL apoc.do.when(
   sec IS NULL,
  'MERGE (s:Sector {sectorName:line.SOUSBRICK}) ON CREATE SET s.identifier=line.CODE_SBRICK ON CREATE SET s.region=line.TERRITOIRE ON CREATE SET s.regionIdentifier=line.CODE_TER ON CREATE SET s.city=line.BRICK ON CREATE SET s.cityIdentifier=line.CODE_BRICK ON CREATE SET s.description=line.SOUSBRICK ON CREATE SET s.zip=line.SOUSBRICK ON CREATE SET s.id = toString(id(s)) RETURN s',
  'SET sec.identifier=line.CODE_SBRICK SET sec.region=line.TERRITOIRE SET sec.regionIdentifier=line.CODE_TER SET sec.city=line.BRICK SET sec.cityIdentifier=line.CODE_BRICK SET sec.description=line.SOUSBRICK SET sec.zip=line.SOUSBRICK SET sec.sectorName = line.SOUSBRICK   RETURN sec AS s',
  {sec: sec, line: line}) YIELD value
WITH value.s AS s,line,c,r
MERGE (s)-[:IS_A_SECTOR_BELONGING_TO_THAT_REGION]->(r) 
WITH s,c,r,line
MERGE (s)-[:IS_A_SECTOR_BELONGING_TO_THAT_CITY]->(c)

added index on description for sector constrains becomes

CREATE CONSTRAINT ON (r:Region) ASSERT r.region IS UNIQUE;
CREATE CONSTRAINT ON (c:City) ASSERT c.cityName IS UNIQUE;
CREATE CONSTRAINT ON (s:Sector) ASSERT s.sectorName IS UNIQUE;
CREATE CONSTRAINT ON (s:Sector) ASSERT s.description IS UNIQUE;

Plan from profiler

Cypher version: CYPHER 3.4, planner: COST, runtime: INTERPRETED. 175350 total db hits in 14321 ms****

Directely without Profiler Created 3403 relationships, completed after 12702 ms.

Good enhancement from 36 sec to 12 sec but still need optimization.

Can you add to your question the reasoning behind your usage of the "BLANKEMPTYVIDE" substring? It may be flawed, as it seems that it can be possible to attempt to execute `SET` clauses that violate your uniqueness constraints. Also, can you explain why you thought you needed to call `apoc.do.when` instead of using pure Cypher? — cybersam, May 31 '18 at 01:53
Thank you cybersam if you access my previous questions I explain why. https://stackoverflow.com/questions/50458249/merge-with-optional-match-to-load-csv-to-neo4j-gdb https://stackoverflow.com/questions/50497811/how-to-make-the-cypher-request-to-handle-those-both-cases-without-duplicating-no — Schwertfisch, May 31 '18 at 03:16
For using apoc.do.when it s not obligtory or mandatory solution but I find that using it it will solve my problem as I describe it on the previous questions (links given in my previous comment just up the current comment) — Schwertfisch, May 31 '18 at 03:36

score 0 · Accepted Answer · answered May 31 '18 at 01:58

The operations with the largest DB hits reference a NodeByLabelScan followed by a filter, yielding only 1663 rows or results from the starting 1.3 million or so nodes of the label. This seems like an opportunity to add an index, if possible.

You haven't expanded the nodes in the query plan, so we can't see any info on which part of the query those db hits are associated with, but if I were to guess it would be from here:

OPTIONAL MATCH (sec:Sector) 
WHERE  sec.description contains "BLANKEMPTYVIDE" AND sec.regionIdentifier=line.CODE_TER

I don't see any evidence you've created a constraint or index on :Sector(description) or :Sector(regionIdentifier), so this is probably where the planner was forced to use a NodeByLabelScan, which as you can see is expensive. You're going to want an index on one of these, or a unique constraint (depending on if any of these properties is unique to :Sector).

Separately, you're doing a curious thing in several places throughout your query that looks like this:

MATCH (s {identifier:line.CODE_SBRICK}),(r{identifier:line.CODE_TER})

The variables you're using in your MATCH are already bound, so I would suggest using a WHERE clause for these instead to make it clear that this is meant to be a filtering step:

WITH s,c,r,line
WHERE s.identifier = line.CODE_SBRICK AND r.identifier = line.CODE_TER

That said, in all of these cases you've already explicitly set those properties to the line properties in question, so there's no need to perform this kind of filtering anyway, it's unnecessary. I'd suggest removing those matches and just MERGE the relationships.

Thank you very much InverseFalcon for your hints and for your long detailled answer. I ll update my query following your advice and put feed back on my question. It s very late time to do it right now. but i ll do it as soon as possible. Thanks a lot InverseFalcon for your help. — Schwertfisch, May 31 '18 at 03:13

How to optimize this cypher query to get faster Result

1 Answers1