6

I'm using Neo4j 3.1.1 community edition. I'm trying to fetch roughly 80 million entries from the db via python using the officialy supported python driver. The python script is running against localhost, i.e. on the same machine as Neo4j and not over network. Access to Neo4j works all fine until it comes to the point:

result = session.run("match (n:Label) return n.Property as property")
property_list = [record["property"] for record in result]

Assembling the property_list fails in essence with the error statement:

File "...\neo4j\bolt\connection.py", line 124, in fill raise ServiceUnavailable("Failed to read from connection %r" % (self.address,)) neo4j.bolt.connection.ServiceUnavailable: Failed to read from connection Address(host='127.0.0.1', port=7687)

The same code works absolutely fine when fetching a smaller dataset thus assembling a smaller list.

Now I wonder:

  1. Is there an option to keep the bolt session open?
  2. Do I have to configure/tweak the Neo4j Server in a certain way to enable such transactions?
  3. Or is there a magic "third way" to get it done?

1 Answers1

0

This method of retrieving data from a database is actually bad practice as you risk connection timeout (client and server side), server OOM (out-of-memory)/DOS(denial-of-service), application OOM/DOS, and more.

The proper way to retrieve large sets of data is to page your results. In essence, returning large but manageable chucks of data, and requesting the next chuck once you are ready (or async while processing the last chunk).

Tezra
  • 8,463
  • 3
  • 31
  • 68