Query writing performance on neo4j with py2neo

Question

Currently im struggle on finding a performant way, running multiple queries with py2neo. My problem is a have a big list of write queries in python that need to be written to neo4j.

I tried multiple ways to solve the issue right now. The best working approach for me was the following one:

from py2neo import Graph
queries = ["create (n) return id(n)","create (n) return id(n)",...] ## list of queries
g = Graph()
t = graph.begin(autocommit=False)
for idx, q in enumerate(queries):
    t.run(q)
    if idx % 100 == 0:
        t.commit()
        t = graph.begin(autocommit=False)
t.commit()

It it still takes to long for writing the queries. I also tried the run many from apoc without success, query was never finished. I also tried the same writing method with auto commit. Is there a better way to do this? Are there any tricks like dropping indexes first and then adding them after inserting the data?

-- Edit: Additional information:

I'm using Neo4j 3.4, Py2neo v4 and Python 3.7

Are you creating a lot of the same thing? i.e. `CREATE (n:Person {name: 'Joe'}) RETURN id(n)` * 1000 different people? If you are not using indexes to create do you need them later for querying? If so might as well leave them in, if not then do you even need them? Are you initializing a graph from scratch, if so there might be some other tools available to more quickly start it up. — jacob.mccrumb, Nov 13 '18 at 19:32
I sending a big variety of queries to neo4j types including create for nodes, merge and create for relations. The queries can differ a lot. I need the indexes for querying later. No unfortunately I didn't build up the graph from scratch. — Bierbarbar, Nov 14 '18 at 06:45
Fair -- what version of py2neo? More specifically, are you using bolt to connect (available/default as of v4) or http? Are you using the id(n) returns or are they just there for debugging/etc? Can you track run times of all queries and see if any in particular are going slow? Sorry for all the questions, query tuning a ton of queries is a complex thing :) — jacob.mccrumb, Nov 14 '18 at 14:38
Oh yeah sorry i forgot. I'm using py2neo v4 and the newest version of neo4j from docker hub. There a no returns, this queries that i gave a basically placeholders but i just have write queries, where i don't care about the return. I tracked run times - one of the queries is slow in particular it is only about sending so much queries that causes problems. — Bierbarbar, Nov 15 '18 at 09:33
Sounds good -- see @InverseFalcon 's answer below. For creating a lot of similar things use UNWIND + query parameters: `UNWIND $list AS item MERGE (n:Node {foo: item.bar})` and call it with a paremeter list with the props for each node you want to create. And make sure you are using the bolt to connect, not http. — jacob.mccrumb, Nov 15 '18 at 22:04

score 4 · Accepted Answer · answered Nov 15 '18 at 10:18

You may want to read up on Michael Hunger's tips and tricks for fast batched updates.

The key trick is using UNWIND to transform list elements into rows, and then subsequent operations are performed per row.

There are supporting functions that can easily create lists for you, like range().

As an example, if you wanted to create 10k nodes and add a name property, then return the node name and its graph id, you could do something like this:

UNWIND range(1, 10000) as index
CREATE (n:Node {name:'Node ' + index})
RETURN n.name as name, id(n) as id

Likewise if you have a good amount of data to import, you can create a list of parameter maps, call the query, then UNWIND the list to operate on each entry at once, similar to how we process CSV files with LOAD CSV.

Query writing performance on neo4j with py2neo

1 Answers1

Linked