1

Say I want to update a sizeable amount of already existing nodes using data that, for instance, is stored in a pd.Dataframe. Since I know how to write a parametrized query that will handle a single node update, my basic solution is to set this query in a loop and run it for each row in the data frame.

for _,row in df.iterrows():
    query='''MATCH (p:Person)
             WHERE p.name={name} AND p.surname = {surname}
             SET p.description={description} '''


    tx.run(query,name=row['name'],surname=row['surname'],
           description=row['description'])

However, there must be a more direct (and faster) way of passing this information to the query, so that the iteration is "managed" at the server side. Is that true? I haven't been able to find any documentation for that.

HerrIvan
  • 650
  • 4
  • 17

3 Answers3

1

You can do that by runnin a cypher LOAD query and providing a csv file containing your data:

LOAD CSV WITH HEADERS FROM 'file:///file.csv' as csvLine fieldterminator ';' 
MATCH (p:Person {name:csvLine.name, p.surname:csvLine.surname})
SET p.description=csvLine.description

But I don't think there is a solution to pass array of data to a match loop.

Muldec
  • 4,641
  • 1
  • 25
  • 44
1

Instead of looping like this, with one Cypher query executed per entry, you should gather all that into a list parameter of map objects and make a single Cypher query (you could batch this though if you have > 100k or so entries to process). Michael Hunger has a good blog entry on this approach.

You can use UNWIND on the list parameter to transform it into rows, and handle everything all at once. Assuming you pass in the list as data:

UNWIND $data as row
MATCH (p:Person)
WHERE p.name = row.name AND p.surname = row.surname
SET p.description = row.description
InverseFalcon
  • 29,576
  • 4
  • 38
  • 51
1

The essential problem is already addressed in the answer by InverseFalcon, but to provide a complete answer including the py2neo and pandas bits, I post the code below:

query='''UNWIND {batch} AS row
         MATCH (p:Person)
         WHERE p.name=row.name AND p.surname = row.surname
         SET p.description=row.description '''

graph.run(query,batch=df.to_dict(orient='records'))

So, at the end this was more of a neo4j than a py2neo question, and the relevant piece of info in neo4j's docs is here

HerrIvan
  • 650
  • 4
  • 17