I have a query that reads a set of ID's from a csv file, searches for those nodes in the database and writes the results to a csv. I'm trying to get this query to run as quickly as possible and was wondering if I could parallelise the read operation using apoc.periodic.iterate
:
http://neo4j-contrib.github.io/neo4j-apoc-procedures/3.5/cypher-execution/commit-batching/
I've written a query that does what I need but really I just want to find out how to run this query as quickly as possible.
Here's the current, version of the query:
CALL apoc.export.csv.query('CALL apoc.load.csv(\'file:///edge.csv\') YIELD map as edge
MATCH (n:paper)
WHERE n.paper_id = edge.`From` OR n.paper_id = edge.`To`
RETURN n.paper_title',
'node.csv', {});
This query creates the resulting node.csv
file that I want but as edge.csv
grows in size the operation can slow down considerably.
What I was hoping to do was something like this:
CALL apoc.periodic.iterate(
'LOAD CSV WITH HEADERS FROM \'file:///edge.csv\' as row RETURN row',
'CALL apoc.export.csv.query(\'MATCH (n:paper) WHERE n.paper_id = row.`From` OR n.paper_id = row.`To` RETURN DISTINCT(n.paper_id) AS paper_id\', \'nodePar.csv\', {})'
, {batchSize:10, iterateList:true, parallel:true, failedParams:0})
;
This query will run but produce no output except for the following message:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| batches | total | timeTaken | committedOperations | failedOperations | failedBatches | retries | errorMessages | batch | operations | wasTerminated | failedParams |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 14463 | 144629 | 0 | 144629 | 0 | 0 | 0 | {} | {total: 14463, committed: 14463, failed: 0, errors: {}} | {total: 144629, committed: 144629, failed: 0, errors: {}} | FALSE | {} |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
My main question is: can apoc.periodic.iterate be used in this way to accelerate this query and if so, how?
And then alternatively, is there any other way to speedup this query as the edge.csv
file grows in size?