0

I have a large data set which contains 22 million records in JSON format and I use apoc.periodic.iterate and apoc.mongodb to import that from mongodb database to neo4j. after importing 3 million of records and occupying 6g of memory the connection to the server lost and heap size exception occurs. I changed the config file and set heap and page-cache, but that didn't take effect which is the main problem. And by the way running the code in browser and with python driver have the same result. although when I import data manually and import it in 2.5 million limits and then skip that in the next query execution and import the next 2.5 million batch, it works. I actually want to do this with python driver, but I couldn't simulate the manual way of handling it.

there is a error in log file that says: fatal error occurred during protocol handshaking... An established connection was aborted by the software in your host machine...

ali
  • 1
  • 1
  • try paginating the entire task in small chunks of task. If you are loading such large data at once, it gonna break definitely. Instead what you can do is load data in paginated mode or write data in small batches of inserts etc. – Kiran Maniya Sep 02 '19 at 08:37
  • As I said I don't want to load this data at once and I used batch importing mechanism of course. – ali Sep 02 '19 at 10:36

1 Answers1

0

For the first part please send your configuration. and for the pyhton driver simple you can use cypher with list of map params:

from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j",    "password"))

def add_friend(tx, name, friend_name):
    tx.run("MERGE (a:Person {name: $name}) "
           "MERGE (a)-[:KNOWS]->(friend:Person {name: $friend_name})", name=name, friend_name=friend_name)


def add_friend_list(tx, friend_list):
    tx.run("UNWIND $friend_list AS friend"
           "MERGE (a:Person {name: friend.name}) "
           "MERGE (a)-[:KNOWS]->(friend:Person {name: friend.friend_name})", friend_list)

def fetch_data_from_db():
     ...

with driver.session() as session:
    friends_list = fetch_data_from_db()
    for friend in friend_list:
        session.write_transaction(add_friend,                     friend["name"],friend["friend_name"])
    session.write_transaction(add_friend_list, friends_list)
Farhoud
  • 91
  • 6