0

I am using python 3 in conjunction with py2neo (v 3.1.2) to insert a large amount of data from MySQL to Neo4j. The table in MySQL has about 20 million rows. I want to do the insertion without converting the MySQL data to CSV as suggested on neo4j's website.

My code looks like the following:

transaction=graph_db.begin()
sql="SELECT id FROM users"
cursor.execute(sql)
user_data=cursor.fetchall()
count=1
for row in user_data:
    user_node=Node("User",user_id=row[0])
    transaction.create(user_node)
    if count%10000==0:
        transaction.commit()
    count=count+1

The goal was to insert in batches of 10000. But the transaction breaks down after the first iteration (the first insertion of a batch of 10k). The following is the error:

raise TransactionFinished(self)
py2neo.database.TransactionFinished: <py2neo.database.BoltTransaction object at 0x104e36588>

Can someone explain what this error means and how to solve this issue?

Community
  • 1
  • 1
Jmj
  • 25
  • 1
  • 4

1 Answers1

0

I do not know python, but the problem is that you are in the cycle commit the transaction and do not open it:

sql="SELECT id FROM users"
cursor.execute(sql)
user_data=cursor.fetchall()
count=1
for row in user_data:
    if count%10000==1:
        transaction=graph_db.begin()
    user_node=Node("User",user_id=row[0])
    transaction.create(user_node)
    if (count%10000==0) or (count==len(user_data)):
        transaction.commit()
    count=count+1
stdob--
  • 28,222
  • 5
  • 58
  • 73
  • I don't think this would work. How would the first 10000 nodes be created without the transaction object being declared (since you have initialised it under the if condition)? – Jmj Feb 23 '17 at 13:51
  • I tried it. Got the same error again. Is it related to the speed with which neo adds nodes to the db, which may be more than the time taken to form the next batch of insertions? – Jmj Feb 23 '17 at 14:31
  • Hey, my bad, I set one of the values wrong. It works fine now, thanks! On a side-note, is this method feasible for such large databases or should I really go for the SQL dump->CSV->Neo insertion route? – Jmj Feb 23 '17 at 15:17
  • To my view is better to use the import from CSV with periodic commit. – stdob-- Feb 23 '17 at 15:20