Multiple "CREATE INDEX" in neo4j slowing down search/updates to the graph database?

Question

Recently, I noticed that uploading from CSV files (creating nodes and relationships) into my graph database has been slown down in a great deal. In the process of looking for the culprits, I was wondering doing multiple "CREATE INDEX :[node label](node property)" may be one of the reasons. Does anyone know the effect of typing e.g. "CREATE INDEX :Person(Name)" on the same graph database? I realized whenever I upload another CSV file into the same graph database, I was doing "CREATE INDEX :Person(Name)". Does this mean an index on Person is being created every time, resulting multiple index files? Or, there is only one unique index table for a unique name of label and property pair? Thank you.

Here is my cypher query to update CSV file data: Each file contains three columns of double precision data and 5000 rows. It takes about 300 seconds (5min) to update each file. This seems way too long.

from py2neo import Graph
import glob

for file_path in glob.glob("*.csv"):
    paths = 'File:'+file_path
    tx= graph.cypher.begin()
    qs1 = "LOAD CSV WITH HEADERS FROM {file_path} AS csvLine \
           MATCH (p1:Person {Id:csvLine.name}),(p2:Product {Id:csvLine.ID}) \
           MERGE (p1)-[s:BOUGHT]-(p2) \
                  ON CREATE SET s.numbers=csvLine.numbers,
                                s.price  =csvLine.price,
                                s.location=csvLine.location,
                                s.date = csvLine.date
    tx.append(qs1, parameters = {"file_path":paths})
    tx.commit()

score 0 · Answer 1 · answered Feb 02 '15 at 19:12

0

There is only one index. You can see them when you execute :schema in browser and schema in the shell.

Perhaps you can share your load-csv statement? And the indexes you have.

Best would also be to share the profile information (prefix your statement with profile in the shell) , and be aware of these issues:

http://neo4j.com/developer/guide-import-csv/

Esp. the eager loading.

answered Feb 02 '15 at 19:12

Michael Hunger

41,339
3
57
80

Thanks for the site info. Will check it out for the details. In the meantime, I updated my cypher query in the question. Would appreciate any suggestion. FYI, each file contains 3 columns of double precision data with roughly 5000 rows. – user4279562 Feb 03 '15 at 00:16
Based on the reference, I find my CSV files are very clean, so I doubt it is the CSV file problem. My doubt lies in the memory, on the other hand. Currently, the total memory in my server is 3854812 (KB) and 3631204 (KB) is used. 223576 (KB) is free. In neo4j.properties, I currently set neostore.nodestore.db.mapped_memory=25M neostore.relationshipstore.db.mapped_memory=50M neostore.propertystore.db.mapped_memory=90M neostore.propertystore.db.strings.mapped_memory=130M neostore.propertystore.db.arrays.mapped_memory=130M. Do you think this is a memory issue? – user4279562 Feb 03 '15 at 16:01

score 0 · Answer 2 · answered Feb 04 '15 at 21:37

As suspected, the culprit turned out to be the memory issue. Followed the instruction under "Memory Config" in http://neo4j.com/developer/guide-import-csv/ for both (1) neo4j-wrapper and (2) memory mapping settings appropriately for my server specs.

It should be noted that although the config files say that "the Java heap size is dynamically calculated based on available system resources.", it seems a manual setting is necessary for a medium or large dataset.

Multiple "CREATE INDEX" in neo4j slowing down search/updates to the graph database?

2 Answers2