I am new to Cypher. I am building a simple graph using GrapheneDB and py2neo (version 2.0.2)
In my simple graph I have Repository
, Organization
& People
nodes. IN_ORGANIZATION
and IS_ACTOR
are two types of relationships. Below is the code snippet for creating nodes and relationships (entire code on GitHub , refer to lines 88 - 108)
#Create repository node if one does not exist
r = graph.merge_one("Repository", "id", record["full_name"])
#Update timestamp with time now in epoch milliseconds
r.properties["created_at"] = MyMoment.TNEM()
#Apply property change
r.push()
...
#Create organization node if one does not exist
o = graph.merge_one("Organization", "id", record["organization"])
#Update timestamp with time now in epoch milliseconds
o.properties["created_at"] = MyMoment.TNEM()
#Apply property change
o.push()
rel = Relationship(r,"IN_ORGANIZATION",o)
#create unique relation between repository and organization
#ignore if relation already exists
graph.create_unique(rel)
...
#Create actor relation if one does not exist
p = graph.merge_one("People", "id", al)
#Update timestamp with time now in epoch milliseconds
p.properties["created_at"] = MyMoment.TNEM()
p.push()
rel = Relationship(r,"IS_ACTOR",p)
#create unique relation between repository and people
#ignore if relation already exists
graph.create_unique(rel)
Above code works very well on a small data set. When the data set grows where ~20K nodes and ~15K relations are created/merged each hour the processing time is longer than an hour (sometimes several hours). I need to reduce the processing time. What are other alternate options I can explore? I was thinking of batch mode? How can I use it with merge_one
and create_unique
? Any ideas?