Can I use apoc.periodic.updates to return a query output?

Question

I have a query that reads a set of ID's from a csv file, searches for those nodes in the database and writes the results to a csv. I'm trying to get this query to run as quickly as possible and was wondering if I could parallelise the read operation using apoc.periodic.iterate: http://neo4j-contrib.github.io/neo4j-apoc-procedures/3.5/cypher-execution/commit-batching/

I've written a query that does what I need but really I just want to find out how to run this query as quickly as possible.

Here's the current, version of the query:

CALL apoc.export.csv.query('CALL apoc.load.csv(\'file:///edge.csv\') YIELD map as edge
MATCH (n:paper)
WHERE n.paper_id = edge.`From` OR n.paper_id = edge.`To`
RETURN n.paper_title',
'node.csv', {});

This query creates the resulting node.csv file that I want but as edge.csv grows in size the operation can slow down considerably.

What I was hoping to do was something like this:

CALL apoc.periodic.iterate(
'LOAD CSV WITH HEADERS FROM \'file:///edge.csv\' as row RETURN row',
'CALL apoc.export.csv.query(\'MATCH (n:paper) WHERE n.paper_id = row.`From` OR n.paper_id = row.`To` RETURN DISTINCT(n.paper_id) AS paper_id\', \'nodePar.csv\', {})'
, {batchSize:10, iterateList:true, parallel:true, failedParams:0})
;

This query will run but produce no output except for the following message:

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| batches | total  | timeTaken | committedOperations | failedOperations | failedBatches | retries | errorMessages | batch                                                   | operations                                                | wasTerminated | failedParams |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 14463   | 144629 | 0         | 144629              | 0                | 0             | 0       | {}            | {total: 14463, committed: 14463, failed: 0, errors: {}} | {total: 144629, committed: 144629, failed: 0, errors: {}} | FALSE         | {}           |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

My main question is: can apoc.periodic.iterate be used in this way to accelerate this query and if so, how?

And then alternatively, is there any other way to speedup this query as the edge.csv file grows in size?

Is there an [index](https://neo4j.com/docs/developer-manual/current/cypher/schema/index/) on `:paper(paper_id)`? — cybersam, Aug 27 '19 at 20:20

Can I use apoc.periodic.updates to return a query output?

0 Answers0