1

I have some data in csv format. Each row describes a relationship between two nodes. Nodes can have multiple relationships with other nodes, although there is only one relationship per row.

For example this is a relationship derived from one row in the csv:

(Entity1:Name {Type,Details})-[Relationship Type]->(Entity2:Name {Type,Details})

Entity1 could, on a different row be in the place of Entity2 and vice versa. Entity1 could also have relationships with Entity3 and Entity4 and so on although its type will never change.

Using Py2Neo I can import and create the relationships, but end up with loads of duplicate nodes for the entities. I want each Entity node to be unique.This is what I have:

def payload1(entity1,type1,relationtype,details_one,entity2,type2,details_two):
        graph = Graph("bolt://localhost:7687",auth=("Admin", "Password"))
        Entity1 = Node(type1,Name=entity1,Details=details_one)
        Entity2 = Node(type2,Name=entity2,Details=details_two)
        graph.create(Entity1)
        graph.create(Entity2)
        graph.create(Relationship(Entity1,relationtype,Entity2))




    df = pd.read_csv ('Data.csv')
    entities1 = df['Entity1'].tolist()
    types1 = df['Type'].tolist()
    relations1 = df['RelationType'].tolist()
    details1 = df['Details'].tolist()
    entity2 = df['Entity2'].tolist()
    types2 = df['Type2'].tolist()
    details_two = df['Details.1'].tolist()
    for (a, b, c,d,e,f,g) in itertools.zip_longest(entities1,types1,relations1,details1,entity2,types2,details_two):
        print (a, b, c,d,e,f,g)
        data = [((a,b,d),(c),(e,f,g))]
        keys = ["Name","Type","Details"]
        payload1(a,b,c,d,e,f,g)
Owais Arshad
  • 303
  • 4
  • 18

1 Answers1

0

Instead of doing a create, do a merge. Simply put, merge will not create a duplicate node if it exists. Here is the documentation about it:

Reference: https://py2neo.org/v4/_modules/py2neo/database.html#Transaction.merge

For each node, the merge is carried out by comparing that node with a potential remote equivalent on the basis of a single label and property value. If no remote match is found, a new node is created; if a match is found, the labels and properties of the remote node are updated.

OLD code:
    graph.create(Entity1) 

NEW code:
    graph.merge(Entity1) 
jose_bacoy
  • 12,227
  • 1
  • 20
  • 38
  • That did not work - it asks me for a primary key and label for Entity1. – Owais Arshad Jan 24 '22 at 18:28
  • Your node does not have constraint and an index so you should define them first so that neo4j can tell if the node exists or not. For example; Create a constraint: CREATE CONSTRAINT ON (n:Person) ASSERT n.name IS UNIQUE; CREATE INDEX ON :Person(name); – jose_bacoy Jan 24 '22 at 19:21
  • The node names are type1 and type2. Since I don't have a copy of your csv, I will let you figure it out. – jose_bacoy Jan 24 '22 at 19:22
  • All properties in the merge should be identical to avoid duplicate nodes. Sometimes this means harmonizing or moving the property to edges or having ranges/lists for the property. For instance, you will create two nodes if a name of one individual in source data is either John Doe or John C Doe. Even extra spaces with cause duplicates. – David A Stumpf Jan 24 '22 at 19:34