1

I have a json file with data of around 1.4 million nodes and I wanted to construct a Neo4j graph database for that. I tried to use py2neo's batch submit function. My code is as follows:

# the variable words is a list containing node names
from py2neo import neo4j
batch = neo4j.WriteBatch(graph_db)
nodedict = {}
# I decided to use a dictionary because I would be creating relationships
# by referring to the dictionary entries later
for i in words:
    nodedict[i] = batch.create({"name":i})
results = batch.submit()

The error shown is as follows:

Traceback (most recent call last):
  File "test.py", line 36, in <module>
    results = batch.submit()
  File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2116, in submit
    for response in self._submit()
  File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2085, in _submit
    for id_, request in enumerate(self.requests)
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 427, in _send
    return self._client().send(request)
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 364, in send
    return Response(request.graph_db, rs.status, request.uri, rs.getheader("Loc$
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 278, in __init__
    raise SystemError(body)
SystemError: None

Can anybody please tell me what exactly is happening here? Does it have anything to do with the fact that the batch query is pretty large? If so, what can be done? Thanks in advance! :)

  • Well, I guess I kinda found the answer here: http://stackoverflow.com/questions/17902741/py2neo-neo4j-system-error-create-batch-nodes-relationships?rq=1 I guess I'll try with breaking the huge batch query into smaller chunks of 5k queries and then run the multiple batch submit processes. Hope it works :) – AnotherCodingEnthusiast Jul 09 '14 at 15:51

3 Answers3

3

So here's what I figured out (Thanks to this question: py2neo - Neo4j - System Error - Create Batch Nodes/Relationships):

The py2neo batch submit function has it's own limitations in terms of queries that can be made. While, I wasn't able to get a exact amount on the upper limit, I tried to limit my number of queries per batch to 5000. So I decided to run the following piece of code:

# the variable words is a list containing node names
from py2neo import neo4j
batch = neo4j.WriteBatch(graph_db)
nodedict = {}
# I decided to use a dictionary because I would be creating relationships
# by referring to the dictionary entries later

for index, i in enumerate(words):
    nodedict[i] = batch.create({"name":i})
    if index%5000 == 0:
        batch.submit()
        batch = neo4j.WriteBatch(graph_db) # As stated by Nigel below, I'm creating a new batch
batch.submit() #for the final batch

This way, I sent batch requests (of size 5k queries) and was successfully able to get my entire graph created!

Community
  • 1
  • 1
1

There's no real way to describe a limit on the number of jobs that a batch can contain - it can vary wildly based on a number of factors. The best bet in general is to experiment to find an optimum size for your use case and go with that. It looks like this is what you are already doing :-)

In terms of your solution, I'd recommend one tweak. Batch objects weren't designed to be reused so instead of clearing the batch after every submission, simply create a new one. The ability to submit a batch multiple times will be removed in the next version of py2neo anyway.

Nigel Small
  • 4,475
  • 1
  • 17
  • 15
  • Thanks for your insight! I had one more doubt though. In the next step, I wanted to create relationships using the nodes stored in the dictionary. However, because I'm submitting the batch, it shows a Value error: Request not found. Can you please suggest something? – AnotherCodingEnthusiast Jul 10 '14 at 08:58
  • Oh I guess I figured it out. Thanks to this question: http://stackoverflow.com/questions/23812614/batching-in-py2neo – AnotherCodingEnthusiast Jul 10 '14 at 09:51
  • It is chiefly because of the issue raised in the question you reference that batches will become single-use in the next version of py2neo. It will be necessary to discard a batch after submission and create another if required. – Nigel Small Jul 11 '14 at 06:15
0

I had the same issue after I started using batch create via graph.create(*alist). The above answers pointed me in the right direction and I ended up using this snippet inspired by https://gist.github.com/anonymous/6293739 from this question py2neo - Neo4j - System Error - Create Batch Nodes/Relationships

chunk_size=500
chunks=(alist[pos:pos + chunk_size] for pos in xrange(0, len(alist), chunk_size))
for c in chunks:
    graph.create(*c)

PS py2neo==2.0.7

Community
  • 1
  • 1
citynorman
  • 4,918
  • 3
  • 38
  • 39