0

I just started to learn py2neo and neo4j, and I'm having this problem of duplicates. I'm writing a simple python script in python that will make a database of scientific papers and authors. I only need to add the nodes of papers and authors and add the their relationship. I was using this code, that works fine but is very slow:

    paper = Node('Paper', id=post_id)
    graph.merge(paper)
    paper['created_time'] = created_time
    graph.push(paper)

    for author_id,author_name in paper_dict['authors']:
        researcher = Node('Person', id=author_id)
        graph.merge(researcher)
        researcher['name'] = author_name
        graph.push(researcher)

        wrote = Relationship(researcher,'author', paper)
        graph.merge(wrote)

So, in order to write multiple relationships at the same time, I'm trying to use transaction. My problem is that if I run this multiple times for the same papers and authors, it assumes that they are different entities and then duplicates each node and relationship in the database (I tried to run the scrip multiple times). But the same doesn't happen with the previous code. This is the code that uses transactions:

    tx = graph.begin()

    paper = Node('Paper', id=post_id)
    paper['created_time'] = created_time

    tx.create(paper)

    for author_id,author_name in paper_dict['authors']:
        researcher = Node('Person', id=author_id)
        researcher['name'] = author_name
        tx.create(researcher)
        wrote = Relationship(researcher,'author', paper)
        tx.create(wrote)
    tx.commit()
Miguel
  • 2,738
  • 3
  • 35
  • 51

1 Answers1

1

I believe you should use the merge function, and not the create function to avoid duplicates. Consider the following source code:

    import py2neo

    from py2neo import Graph, Node, Relationship

    def authenticateAndConnect():
      py2neo.authenticate('localhost:7474', 'user', 'password')
      return Graph('http://localhost:7474/default.graphdb/data/')     

    def actorsDictionary():
      return 

    def createData():
      graph = authenticateAndConnect()
      tx = graph.begin()
      movie = Node('Movie', title='Answer')

      personDictionary = [{'name':'Dan', 'born':2001}, {'name':'Brown', 'born':2001}]
      for i in range(10):
        for person in personDictionary:
          person = Node('Person', name=person['name'], born=person['born'])
          tx.merge(person)
          actedIn = Relationship(person, 'ACTED_IN', movie)
          tx.merge(actedIn)

      tx.commit()

    if __name__ == '__main__':
      for i in range(10):
          createData()
Portable
  • 323
  • 1
  • 9