Recently, I noticed that uploading from CSV files (creating nodes and relationships) into my graph database has been slown down in a great deal. In the process of looking for the culprits, I was wondering doing multiple "CREATE INDEX :[node label](node property)" may be one of the reasons. Does anyone know the effect of typing e.g. "CREATE INDEX :Person(Name)" on the same graph database? I realized whenever I upload another CSV file into the same graph database, I was doing "CREATE INDEX :Person(Name)". Does this mean an index on Person is being created every time, resulting multiple index files? Or, there is only one unique index table for a unique name of label and property pair? Thank you.
Here is my cypher query to update CSV file data: Each file contains three columns of double precision data and 5000 rows. It takes about 300 seconds (5min) to update each file. This seems way too long.
from py2neo import Graph
import glob
for file_path in glob.glob("*.csv"):
paths = 'File:'+file_path
tx= graph.cypher.begin()
qs1 = "LOAD CSV WITH HEADERS FROM {file_path} AS csvLine \
MATCH (p1:Person {Id:csvLine.name}),(p2:Product {Id:csvLine.ID}) \
MERGE (p1)-[s:BOUGHT]-(p2) \
ON CREATE SET s.numbers=csvLine.numbers,
s.price =csvLine.price,
s.location=csvLine.location,
s.date = csvLine.date
tx.append(qs1, parameters = {"file_path":paths})
tx.commit()