2

Recently a script which uses py2neo package and connects via the Bolt protocol began failing unexpectedly with a ProtocolError: Server closed connection, and I'm struggling to understand why. Note the script works when using HTTP as the protocol.

The script pulls the Neo4j graph, augments it in Python and then attempts to push the relevant changes back to the database. The whole process takes about an hour, and was working reliably for quite some time, however the push stage recently starting failing.

If I create a toy example, bypasses the pull and augmentation, then the push works and thus I'm thinking the server closed the connection due to a timeout, however I can't find any timeout related parameter for Bolt in py2neo. Note I have set the HTTP socket timeout to be 9999 seconds (~ 2.75 hours),

from py2neo.packages.httpstream import http
http.socket_timeout = 9999

though my understanding of this is i) it's unrelated to Bolt and ii) the timeout is at the connection level which greatly exceeds the time the script ran.

For reference I'm using Neo4j v3.0.3 and py2neo v3.1.2. The stack-trace was:

File "/usr/local/lib/python2.7/dist-packages/py2neo/database/__init__.py", line 1017, in __exit__
self.commit()
File "/usr/local/lib/python2.7/dist-packages/py2neo/database/__init__.py", line 1059, in commit
self._post(commit=True)
File "/usr/local/lib/python2.7/dist-packages/py2neo/database/__init__.py", line 1291, in _post
self.finish()
File "/usr/local/lib/python2.7/dist-packages/py2neo/database/__init__.py", line 1296, in finish
self._sync()
File "/usr/local/lib/python2.7/dist-packages/py2neo/database/__init__.py", line 1286, in _sync
connection.fetch()
File "/usr/local/lib/python2.7/dist-packages/py2neo/packages/neo4j/v1/bolt.py", line 323, in fetch
raw.writelines(self.channel.chunk_reader())
File "/usr/local/lib/python2.7/dist-packages/py2neo/packages/neo4j/v1/bolt.py", line 174, in chunk_reader
chunk_header = self._recv(2)
File "/usr/local/lib/python2.7/dist-packages/py2neo/packages/neo4j/v1/bolt.py", line 157, in _recv
raise ProtocolError("Server closed connection")
ProtocolError: Server closed connection

and the stripped down Python code is of the form,

import py2neo
from py2neo.packages.httpstream import http
http.socket_timeout = 3600

graph = py2neo.Graph(
    host='localhost',
    bolt=True,
    bolt_port=4096,
    http_port=4095,
)

# Pull the graph from the Neo4j database via graph.run(...) statements,
# augments the graph etc.
# ...

# Exception is thrown when the following push transaction is executed.
with graph.begin() as tx:
    statement = """
        UNWIND {rows} AS row
        WITH row.source AS source, row.target AS target
        MATCH (s:Entity)
        USING INDEX s:Entity(uuid)
        WHERE s.uuid = source
        MATCH (t:Entity)
        USING INDEX t:Entity(uuid)
        WHERE t.uuid = target
        MATCH (s)-[r:FAVORITED]->(t)
        DELETE r
    """

    rows = [{
        'source': '8267d7d0-a837-11e6-b841-22000bcec6a9',
        'target': 'c6296c97-a837-11e6-b841-22000bcec6a9',
    }]

    tx.run(statement, rows=rows)

Does anyone have any suggestions as to how I can further debug this or what causes the connection to close? I've looked through the _recv function but it's not apparent why no data was received by the socket.

Looking through the Neo4j debug.log file the only possible related error was

java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)

I've also checked the uptime of the service, and confirmed that it was running throughout the entire duration of the script (HH:MM:SS),

 > ps -p "14765" -o etime=
 16:55:31
John
  • 783
  • 6
  • 12

1 Answers1

0

I believe the issue is with:

tx.run(statement, rows=rows)

The second argument to CypherResoure.run() is a dictionary of parameters. You're passing them in as additional python params. See py2neo code.

Try:

tx.run(statement, {"rows": rows});
Ryan Boyd
  • 2,978
  • 1
  • 21
  • 19
  • We're using v3 of py2neo which has the same [signature](http://py2neo.org/v3/database.html#py2neo.database.Graph.run). The `parameters` argument is optional, whereas `rows` are interpreted as ``kwargs`. Note the code works fine when using HTTP as opposed to Bolt. – John Feb 09 '17 at 22:35