2

I have a neo4j database populated with thousands of nodes without any relationship defined. I have a file which contains relationships between nodes, so I would like to create relationships between these nodes created in the database. My current approach is:

from py2neo import NodeSelector,Graph,Node,Relationship
graph = Graph('http://127.0.0.1:7474/db/data')
tx = graph.begin()
selector = NodeSelector(graph)
with open("file","r") as relations:
    for line in relations:
        line_split=line.split(";")
        node1 = selector.select("Node",unique_name=line_split[0]).first()
        node2 = selector.select("Node",unique_name=line_split[1]).first()
        rs = Relationship(node1,"Relates to",node2)
        tx.create(rs)
tx.commit()

The current approach needs 2 queries to database in order to obtain nodes to form a relationship + relationship creation. Is there a more efficient way given that nodes currently exist in the database?

Jausk
  • 325
  • 1
  • 3
  • 11
  • You could create a single string for the Cypher query then [Graph.run](http://py2neo.org/v3/database.html#py2neo.database.Graph.run) it. – jonrsharpe Mar 22 '18 at 08:40

1 Answers1

2

You can use some form of node caching while populating relations:

from py2neo import NodeSelector,Graph,Node,Relationship
graph = Graph('http://127.0.0.1:7474/db/data')
tx = graph.begin()
selector = NodeSelector(graph)
node_cache = {}

with open("file","r") as relations:
    for line in relations:
        line_split=line.split(";")

        # Check if we have this node in the cache
        if line_split[0] in node_cache:
            node1 = node_cache[line_split[0]]
        else:
            # Query and store for later
            node1 = selector.select("Node",unique_name=line_split[0]).first()
            node_cache[line_split[0]] = node1

        if line_split[1] in node_cache:
            node2 = node_cache[line_split[1]]
        else:
            node2 = selector.select("Node",unique_name=line_split[1]).first()
            node_cache[line_split[1]] = node2

        rs = Relationship(node1,"Relates to",node2)
        tx.create(rs)

tx.commit()

With the above you will only load each node once and only if that node appears in your input file.

urban
  • 5,392
  • 3
  • 19
  • 45