4

Trying to insert many new Relationship on existing nodes. Current code is taking too much time for millions of Relationship. Is there a way to optimise the same?

from py2neo import *

g = Graph()
nodes = NodeMatcher(g)

for persons in relation:
    person_a, person_b = persons

    a = nodes.match("person",name=person_a).first()
    b = nodes.match("person",name=person_b).first()

    ab = Relationship(a, 'KNOWS' b)
    ab['date'] = '01-01-1980'

    g.create(ab)

Now assume 2 things:

  1. Relations are in millions
  2. To process things faster, I have a pickle dump which consists of all the node details in py2neo.data.Node datatype so that we can skip the nodes.match(...) part.

Note: If there is any other way to create the complete graph faster in bulk mode (where I'm willing to create the entire graph from scratch if time taken by adding Relationships > time taken to create entire graph). Number of nodes are around 80K.

kunal
  • 35
  • 5

2 Answers2

1

Py2neo has a bulk API that you might find useful: https://py2neo.org/2021.1/bulk/index.html

Nigel Small
  • 4,475
  • 1
  • 17
  • 15
  • 1
    Have already tried this one, this is working fine (better than creating 1 relationship at a time) but it's still slow. For around 100K relationships it's taking approx 40 mins (in batches of 5000 each) – kunal Jun 18 '21 at 11:22
0

I found the bulk API is quick if you do not use the start_node_key and end_node_key parameters and instead directly specify the ID.

hwong557
  • 1,309
  • 1
  • 10
  • 15
  • To clarify, if we use only the Node `ID` for creating the relationships, then it'll be faster? Do you have any benchmark comparison? – kunal Jul 29 '22 at 06:24
  • 2
    Since writing the reply above, I have abandoned using `py2neo`. I found it was better to output all my nodes and relationships to a csv, and write a bit of cypher script to manually upload everything. It's still a pain, but I found it was far more performant than passing through a python wrapper. I don't have benchmarks. – hwong557 Jul 29 '22 at 13:04