0

I am running a query in cypher shell to form relationship between 10 million nodes.The query is

CALL apoc.periodic.iterate(
"MATCH (a:HeaderRecord), (b:FormationRecord) 
 WHERE a.WellID = b.WellID 
 CREATE (a)-[rel:HAS_FORMATION]->(b) 
 RETURN rel",
 {batchSize:5000, parallel:true, iterateList:true}
)

The query is running for past one hour but nothing is happening.How can I make is query verbose and fast.

logisima
  • 7,340
  • 1
  • 18
  • 31
Anshul Gupta
  • 71
  • 2
  • 6
  • 2
    can you repost your query, it seems that something is missing ... `apoc.periodic.iterate` takes 3 parameters (2 cypher queries + config) – logisima Apr 24 '18 at 16:14
  • How can I run this query in a batch as this a Cartesian product and if I run this query without apoc.periodic.iterate then I lose connection to Neo4j – Anshul Gupta Apr 24 '18 at 16:22
  • 2
    I can't respond to your question, until you have corrected the cypher query in your question (see my previous comment, your query is not valid) – logisima Apr 24 '18 at 16:38

1 Answers1

0

Is it the query you are using :

CALL apoc.periodic.iterate(
"MATCH (a:HeaderRecord), (b:FormationRecord) 
 WHERE a.WellID = b.WellID 
 RETURN a, b",
 "CREATE (a)-[rel:HAS_FORMATION]->(b)",
 {batchSize:5000, parallel:true, iterateList:true}
)

Have you created an index on :FormationRecord(WellID) or on :HeaderRecord(WellID) ?

logisima
  • 7,340
  • 1
  • 18
  • 31
  • Yes, I have created index on both.Should I create index on only one? – Anshul Gupta Apr 24 '18 at 16:46
  • an my query is the one you are using ? – logisima Apr 24 '18 at 16:58
  • can you perform an explain of this query and give me the result : `EXPLAIN MATCH (a:HeaderRecord), (b:FormationRecord) WHERE a.WellID = b.WellID RETURN a, b` – logisima Apr 24 '18 at 16:58
  • This query builds a cartesian product between disconnected patterns. If a part of a query contains multiple disconnected patterns, this will build a cartesian product between all those parts. This may produce a large amount of data and slow down query processing. While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH (identifier is: (b)) – Anshul Gupta Apr 24 '18 at 17:06
  • This is a textual result of that query and statistics are NodeByLabelScan(HeaderRecord) 480,681 rows. NodeIndexSeek(Formationrecord) 168,334 rows. Produce results 1,683,341.Thanks for responding to my question.Is there a better way to execute this query. – Anshul Gupta Apr 24 '18 at 17:26