When importing data from large CSV files (>200MB) into Neo4j, the response ends up hanging. The query does complete, and all records are imported, however there seems to be some sort of response timeout which results in no indication that the import query has completed. This is an issue as we cannot automate importing multiple files into Neo4j, since the script continues waiting for the query to finish, even though it already has.
Importing 1 file takes around 10-15 minutes.
No errors are thrown anywhere in the pipeline, everything simply hangs. I can only tell when the process has completed as the VM CPU activity dies down.
This process does work on smaller files, and does send back an acknowledgement when the previous file has finished being imported, and moves onto the next.
I have tried running the scripts from both Jupyter notebook as well as a python script directly on the console. I have also even tried running the query directly on Neo4j through the browser console. Each way results in hanging queries, therefore I am not sure if the issue is coming from Neo4j or Py2Neo.
Example query:
USING PERIODIC COMMIT 1000
LOAD CSV FROM {csvfile} AS line
MERGE (:Author { authorid: line[0], name: line[1] } )
Modified python script using Py2Neo:
from azure.storage.blob import BlockBlobService
blob_service = BlockBlobService(account_name="<name>",account_key="<key>")
generator = blob_service.list_blobs("parsed-csv-files")
for blob in generator:
print(blob.name)
csv_file_base = "http://<base_uri>/parsed-csv-files/"
csvfile = csv_file_base + blob.name
params = { "csvfile":csvfile }
mygraph.run(query, parameters=params )
Neo4j debug.log does not seem to be recording any errors.
Sample debug.log:
2019-05-30 05:44:32.022+0000 INFO [o.n.k.i.i.s.GenericNativeIndexProvider] Schema index cleanup job finished: descriptor=IndexRule[id=16, descriptor=Index( UNIQUE, :label[5](property[5]) ), provider={key=native-btree, version=1.0}, owner=42], indexFile=/data/databases/graph.db/schema/index/native-btree-1.0/16/index-16 Number of pages visited: 598507, Number of cleaned crashed pointers: 0, Time spent: 2m 25s 235ms
2019-05-30 05:44:32.071+0000 INFO [o.n.k.i.i.s.GenericNativeIndexProvider] Schema index cleanup job closed: descriptor=IndexRule[id=16, descriptor=Index( UNIQUE, :label[5](property[5]) ), provider={key=native-btree, version=1.0}, owner=42], indexFile=/data/databases/graph.db/schema/index/native-btree-1.0/16/index-16
2019-05-30 05:44:32.071+0000 INFO [o.n.k.i.i.s.GenericNativeIndexProvider] Schema index cleanup job started: descriptor=IndexRule[id=19, descriptor=Index( UNIQUE, :label[6](property[6]) ), provider={key=native-btree, version=1.0}, owner=46], indexFile=/data/databases/graph.db/schema/index/native-btree-1.0/19/index-19
2019-05-30 05:44:57.126+0000 INFO [o.n.k.i.i.s.GenericNativeIndexProvider] Schema index cleanup job finished: descriptor=IndexRule[id=19, descriptor=Index( UNIQUE, :label[6](property[6]) ), provider={key=native-btree, version=1.0}, owner=46], indexFile=/data/databases/graph.db/schema/index/native-btree-1.0/19/index-19 Number of pages visited: 96042, Number of cleaned crashed pointers: 0, Time spent: 25s 55ms
2019-05-30 05:44:57.127+0000 INFO [o.n.k.i.i.s.GenericNativeIndexProvider] Schema index cleanup job closed: descriptor=IndexRule[id=19, descriptor=Index( UNIQUE, :label[6](property[6]) ), provider={key=native-btree, version=1.0}, owner=46], indexFile=/data/databases/graph.db/schema/index/native-btree-1.0/19/index-19
EDIT: used simpler query which still gives same issue