I have a performance issue with bulk insert into neo4j.
I have a csv file with 400k rows which produces about 3.5 million rows, and I use LOAD CSV command, with the latest version on neo4j.
I've noticed that when I user Create statement, the load takes about 4 minutes, and without indexes at all- about 3.5 minutes.
My first question, is whether this is the normal rate of nodes/ min.
Now, my real problem, is that I need to use merge, for data integrity reasons, and when I use it, it can take even 24 hours, together with indexes.
So 2 additional questions will be:
Is the LOAD CSV recommended for the best performance load,
and also: What can I do do about this performance issue?
EDIT:
here is the query:
LOAD CSV WITH HEADERS FROM 'file:///import.csv' AS line FIELDTERMINATOR '|'
MERGE (session :Session { session:line.session })
MERGE (hit :Hit { key:line.key,date_time:line.date_time,session:line.session })
MERGE (user :User { id:line.user_id })
MERGE (session2 :Session2 { session2:line.session2 })
MERGE (country :Country{ name:line.country})
MERGE (tv :TV { name:tv.Model })
MERGE (transfer_protocol :Protocol { name:line.transfer_protocol })
MERGE (os :OS { name:line.os_name ,version:line.os_version, row_key:line.os_name+line.os_version})
Sample: session_guid|hit_key_guid|useridguid|session2_guid|PANASONIC|TCP|ANDROID|5.0
the session,user,session2,country,tv,transfer_protocol and os has unique constraint, and hit has an index
**session1 and session2 can have many hits (1 to 100, average 5) hit_key_guid is different for each csv line
it's running really slow- pretty strong machine, and each 1000 rows can take up to 10 seconds.
also checked with the profiler, and no "Eager"
thanks
Lior