0

I am using community edition of neo4j.I am trying to create 50000 nodes and 93400 relationships using CSV file.But the load csv command in neo4j is taking around 40 mins to create the nodes and relationships. Using py2neo package in python to connect and run cypher queries.Load csv command looks similar to one below:

USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///Sample.csv" AS row WITH row 
MERGE(animal:Animal { name:row.`ANIMAL_NAME`})
ON CREATE SET animal{name:row.`ANIMAL_NAME`,type:row.`TYPE`, status:row.`Status`, birth_date:row.`DATE`}
ON MATCH SET animal +={name:row.`ANIMAL_NAME`,type:row.`TYPE`,status:row.`Status`,birth_date:row.`DATE`}
MERGE (person:Person { name:row.`PERSON_NAME`})
ON CREATE SET person ={name:row.`PERSON_NAME` age:row.`AGE`, address:row.`Address`, birth_date:row.`PERSON_DATE`}
ON MATCH SET person += { name:row.`PERSON_NAME`, age:row.`AGE`, address:row.`Address`, birth_date:row.`PERSON_DATE`}
MERGE (person)-[:OWNS]->(animal);

Infrastructure Details: dbms.memory.heap.max_size=16384M

dbms.memory.heap.initial_size=2048M

dbms.memory.pagecache.size=512M

neo4j_version:3.3.9

How would I get it to work faster.Thanks in advance

ck22
  • 264
  • 3
  • 16
  • There are multiple syntax errors in your query. Can you show your actual query (and also use multiple lines to make it readable)? – cybersam May 14 '20 at 19:46
  • @cybersam thanks for responding.I edited the query to make it more readable but the real question is how to improve the performance,can it be achieved by increasing the resources or optimising the query.BTW i am creating the node indexes before this query – ck22 May 14 '20 at 20:02
  • You still have a few syntax errors, but I can make educated guesses as to what you are trying to do. What indexes do you already have? Also, do you really need to update the properties of nodes that already exist? And what neo4j version are you using? – cybersam May 14 '20 at 20:03
  • syntax errors are because i had to modify the query a bit but the actual runs successfully. I was creating indexes for nodes Animal and Person using 'CREATE INDEX ON :Animal(name) \n'. Yes I need to update the properties if there is a change in the data. – ck22 May 14 '20 at 20:10
  • neo4j_version:3.3.9 – ck22 May 14 '20 at 20:17
  • @cybersam we have upgraded to 4.0.4 enterprise edition and made the query changes as u suggested still it is taking same amount of time.we have a query which creates only nodes(no relationships) from csv file with 15k rows.it took around 7 mins to complete the process and i did explain query which did not contain any eager operations – ck22 May 18 '20 at 12:34

1 Answers1

0

Ideally, you should be using the lastest neo4j version, as there have been many performance improvements since 3.3.9. Since you already have indexes on :Animal(name) and :Person(name), the other main issue is probably that the Cypher planner is generating an expensive Eager operation (at least in neo4j 4.0.3) for your query. Whenever you have performance issues, you. should use EXPLAIN or PROFILE to see the operations that the Cypher planner generates.

Try using this simpler query (which should do the same thing as yours). Using EXPLAIN in neo4j 4.0.3, this query does not use the Eager operation:

:auto USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///Test.csv" AS row
MERGE(animal:Animal {name: row.`ANIMAL_NAME`})
SET animal += {type:row.`TYPE`, status:row.`Status`, birth_date:row.`DATE`}
MERGE (person:Person { name:row.`PERSON_NAME`})
SET person += {age:row.`AGE`, address:row.`Address`, birth_date:row.`PERSON_DATE`}
MERGE (person)-[:OWNS]->(animal);

The :auto command is required in neo4j 4.x when using USING PERIODIC COMMIT.

cybersam
  • 63,203
  • 6
  • 53
  • 76
  • Thanks for your input,will try with your suggestions.Hopefully,it works much faster – ck22 May 14 '20 at 20:53