2

I'm quite new to Neo4j and already lost with all the out of date documentation and very unclear commands, their effect or speed.

I am looking for a way to import some very large data fast. The data is in B scale for one kind of data, split into multiple CSV, but I don't mind fusing it into one.

Doing a very simple import (load csv ... create (n:XXX {id: row.id}) Is taking ages, especially with a unique index, it takes days. I stopped the operations, dropped the unique index and restarted, about 2x faster, but still too slow.

I know about neo4j-import (although deprecated, and there is no documentation on neo4j website about "neo4j-admin import"). It's already extremely unclear how to do simple things like a conditional something. The biggest bummer is that it doesn't seem to work with an existing database.

The main question is, is there anyway to accelerate import of very large CSV files with neo4j? First with simple statement like create, but hopefully with match as well. Right now, running a cypher command such as "match (n:X {id: "Y"}) return n limit 1" takes multiple minutes on the 1B nodes.

(I'm running this on a server, with 200GB+ of RAM and 48CPUs, so probably not a limitation from hardware point of view).

Einharch
  • 69
  • 1
  • 4
  • 2
    Disk IO is also really important ! What do you call a very large CSV ? What is your Neo4j configuration ? Have you done periodic commits ? – logisima Jul 28 '17 at 10:15
  • Thanks for the answer, I shared a bit more details in a gist to not crowd my question: https://gist.github.com/Einharch/23a31f869787950a898fed051e1a6ee0 ... For disk, while not SSDs it's multiple RAIDs of fast disks (the files are read/created very quickly by other soft). Files should fir in RAM also. The server has only Neo4j 3.2.2 community on Ubunto, unmodified. Yes I am using periodic commits every 100K. – Einharch Jul 29 '17 at 02:59
  • I changed the from "merge" to "create" now, in the explain command, it seemed like the most performant solution ... The script has been running for 4 days, still not finished. And because it still locks the index (why???) I can't even divide it to run on multiple process ... It's really bothering and I wonder if neo4j is really ready for production? – Einharch Aug 02 '17 at 08:22
  • @Einharch Did you fix this problem? and seems the gist cannot visit now. – Liping Huang Sep 15 '17 at 01:19

0 Answers0