How can we load an s3 directory with 1000s of part files in neo4j

Question

I have seen this article about loading a single file from s3 in neo4j. But, if I have the data in multiple part files (usually, in case of large datasets), how can we efficiently load this into a neo4j db?

cybersam · Answer 1 · 2020-09-18T20:36:14.480

0

If you want to import a large amount of CSV data (possibly from a large number of files) into a previously unused neo4j DB, you should consider using the import command of the neo4j-admin tool.

You will need to use presigned URLs for all the CSV files, or you can first download all the files from S3.

The import command is very powerful but also takes some effort to configure properly (and may require you to modify your CSV files), so you should carefully read the documentation.

edited Sep 18 '20 at 20:36

answered Sep 18 '20 at 20:30

cybersam

63,203
6
53
76

Thanks @cybersam, but is there a way to parallelize this process so that it speeds up the process of ingesting 1000+ part files? – dumbledorevsbalrog Sep 21 '20 at 19:40
The [import](https://neo4j.com/docs/operations-manual/current/tools/import/) command is supposed to be very fast, so you many not need parallelization. In any case, when doing concurrent updates to the DB, you have to avoid or workaround issues like [deadlocks](https://neo4j.com/docs/java-reference/current/transaction-management/deadlocks/) -- which might be more effort than it is worth. – cybersam Sep 21 '20 at 19:58

How can we load an s3 directory with 1000s of part files in neo4j

1 Answers1