py2neo - Match and Merge two nodes coming from two different csv, and create relationship

Question

I have a relational database and I converted tables to csv files. I imported 2 of them, and create the nodes by specifying the columns to be picked as in following code:

import csv
from py2neo import neo4j, authenticate, Graph, Node, cypher, rel, Relationship
authenticate("localhost:7474", "neo4j", "my_password")
graph_db = Graph()
graph_db.delete_all()

"""import all rows and columns of csv files"""

with open('File1.csv', "rb") as abc_file, open('File2.csv', "rb") as efg_file:
data1 = csv.reader(abc_file, delimiter=';')
data2 = csv.reader(efg_file, delimiter=';')
data1.next()
data2.next()

"""Create the nodes for the all the rows of "Contact Email" column of abc_file"""
rownum = 0
for row in abc_file:
    nodes1 = Node("Contact_Email", email=row[0])
    contact_graph = graph_db.create(nodes1)

"""Create the nodes for the all the rows of "Building_Name" and "Person_Created" 
   columns of efg_file"""
rownum = 0
for row in efg_file:
    nodes2 = Node("Building_Name", name=row[0])
    nodes3 = Node("Person_Created", name=row[1])
    building_graph = graph_db.create(nodes2, nodes3)

Let's say there are 60 emails under "Contact_Email" column of "File1.csv" which is the Primary_Key. It is used as Foreign_Key in "File2.csv" under "Person_Created" column. There 14 buildings specified under "Building Name" with corresponding emails in "Person_Created" columns. My Question is:

1) How can I match the 14 emails in File2.csv "Person_Created" column with the emails in File1.csv "Contact Email" column to avoid duplicates

2) and How can I create a relationship between the "Building Names" (in File2.csv) and "Person_Created" (in File1.csv) without any duplication.. sth like "Building1234 is DESIGNED_BY abc@xyz.com"

How can I do it in py2neo with/without cypher?

score 0 · Answer 1 · answered Jun 23 '15 at 12:37

0

Create an Index or a unique Constraint for the Contact Email.

Probably a good idea to name the attribute of your Node such as email.

While Iterating through the Person_Created, use the email foreign key value to create a node of Contact Email, with the attribute email.

Since the Index / constraint is in place, the Node will be conditionally be created

Also create the relationship between Person Created and Contact Email within this iteration.

answered Jun 23 '15 at 12:37

David Sequeira

1

Thanks for the reply. I followed the steps you mentioned. Tried MATCH and MERGE methods between the defined nodes also. Since the nodes are coming different csv files I couldn't create the connection. Thanks anyways. Any more suggestions from others? – ylcnky Jun 23 '15 at 18:56

score 0 · Accepted Answer · answered Jun 24 '15 at 06:39

Py2neo provides a number of uniqueness functions for just this. Have a look at this page to see merge_one and friends. The node values returned from this can then be stored and used for being unique relationships and paths.

Note that for higher performance though, you'll probably want to look at Cypher transactions or batches. Without these, each action will require a call to the server and at scale, this is slow.

py2neo - Match and Merge two nodes coming from two different csv, and create relationship

2 Answers2