1

I am running a web app using the following components:

  • Python 3.5.2
  • uWSGI 2.0.11.2
  • cassandra-driver 3.6.0
  • cassandra 3.7

With a Cassandra cluster (3 nodes):

  • Node 1 - IP: 172.17.0.4
  • Node 2 - IP: 172.17.0.5
  • Node 3 - IP: 172.17.0.6

Using the configuration NetworkTopologyStrategy and GossipingPropertyFileSnitch.

I have followed the uWSGI connection example from cqlengine:

from cqlengine import connection
from cassandra.io.libevreactor import LibevConnection
from cqlengine.connection import (
    cluster as cql_cluster, 
    session as cql_session
)


try:
    from uwsgidecorators import postfork
except ImportError:
    # We're not in a uWSGI context, no need to hook Cassandra session
    # initialization to the postfork event.
    pass
else:
    @postfork
    def cassandra_init():
        """ Initialize a new Cassandra session in the context.

        Ensures that a new session is returned for every new request.
        """
        if cql_cluster is not None:
            cql_cluster.shutdown()
        if cql_session is not None:
            cql_session.shutdown()

        connection.setup(
            ['172.0.4'],
            'my_keyspace',
            port=9042,
            connection_class=LibevConnection
        )

But I am getting the error "Failed to create connection pool for new host x.x.x.x" on all cassandra nodes (172.17.0.4, 172.17.0.5 and 172.17.0.6):

Respawned uWSGI worker 2 (new pid: 90)
mapping worker 2 to CPUs: 3 4 5
2016-09-14 21:00:47,434 WARNI [cassandra.cluster][Thread-2] Failed to create connection pool for new host 172.17.0.4:
Traceback (most recent call last):
    File "cassandra/cluster.py", line 2232, in cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool (cassandra/cluster.c:38826)
    File "cassandra/pool.py", line 328, in cassandra.pool.HostConnection.__init__ (cassandra/pool.c:6243)
    File "cassandra/cluster.py", line 1107, in cassandra.cluster.Cluster.connection_factory (cassandra/cluster.c:14943)
    File "cassandra/connection.py", line 330, in cassandra.connection.Connection.factory (cassandra/connection.c:5766)
cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
2016-09-14 21:00:47,437 WARNI [cassandra.cluster][Thread-1] Failed to create connection pool for new host 172.17.0.6:
Traceback (most recent call last):
    File "cassandra/cluster.py", line 2232, in cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool (cassandra/cluster.c:38826)
    File "cassandra/pool.py", line 328, in cassandra.pool.HostConnection.__init__ (cassandra/pool.c:6243)
    File "cassandra/cluster.py", line 1107, in cassandra.cluster.Cluster.connection_factory (cassandra/cluster.c:14943)
    File "cassandra/connection.py", line 330, in cassandra.connection.Connection.factory (cassandra/connection.c:5766)
cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
...The work of process 19 is done. Seeya!
worker 7 killed successfully (pid: 19)

According the logs it's able to connect to the nodes but for some reason it seams to disconnect and throw the previous errors:

2016-09-15 23:23:03,786 DEBUG [cassandra.pool][Thread-2] Initializing connection for host 172.17.0.4
2016-09-15 23:23:03,787 DEBUG [cassandra.connection][Thread-2] Not sending options message for new connection(139905425534704) to 172.17.0.4 because compression is disabled and a cql version was not specified
2016-09-15 23:23:03,787 DEBUG [cassandra.connection][Thread-2] Sending StartupMessage on <LibevConnection(139905425534704) 172.17.0.4:9042>
2016-09-15 23:23:03,787 DEBUG [cassandra.connection][Thread-2] Sent StartupMessage on <LibevConnection(139905425534704) 172.17.0.4:9042>
2016-09-15 23:23:03,788 DEBUG [cassandra.connection][event_loop] Got ReadyMessage on new connection (139905425534704) from 172.17.0.4
2016-09-15 23:23:03,788 DEBUG [cassandra.pool][Thread-2] Finished initializing connection for host 172.17.0.4
2016-09-15 23:23:03,788 DEBUG [cassandra.cluster][Thread-2] Added pool for host 172.17.0.4 to session
2016-09-15 22:24:29,239 DEBUG [cassandra.io.libevreactor][Thread-2] Closing connection (139945376028152) to 172.17.0.4
2016-09-15 22:24:29,240 DEBUG [cassandra.io.libevreactor][Thread-2] Closed socket to 172.17.0.4
2016-09-15 22:24:29,240 DEBUG [cassandra.connection][Thread-2] Connection to 172.17.0.4 was closed during the startup handshake
2016-09-15 22:24:29,242 WARNI [cassandra.cluster][Thread-2] Failed to create connection pool for new host 172.17.0.4:

EDITED (Added more debugging info about the issue)

The app can ping any of the nodes on port 9042 so it's not a connectivity issue. If I run nodetool status the three nodes in the cluster seems to be fine:

--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.17.0.5  111.72 KiB  256          67.3%             fdd4740d-1ce5-4deb-9a3e-5c18c80ee63e  rack1
UN  172.17.0.4  98.96 KiB  256          66.8%             4fe5a60c-2b6a-4d57-ab6a-e4176ce69b68  rack1
UN  172.17.0.6  94.67 KiB  256          66.0%             5e2675e3-c2a7-4af1-80f0-4cb9573ecf2b  rack1

I have tried with Cassandra 3.7 and 2.2.7 but getting same results. But if I try to run the app just with the Node1 it works!

The logs in Cassandra nodes displays the following:

INFO  22:14:33 Unexpected exception during request; channel = [id: 0xd6d3c9ae, L:/172.17.0.6:9042 ! R:/172.17.0.5:42944]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer
INFO  22:16:26 Unexpected exception during request; channel = [id: 0x2cfa996f, L:/172.17.0.6:9042 - R:/172.17.0.5:42954]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer
INFO  22:54:13 Unexpected exception during request; channel = [id: 0x4134ec0f, L:/172.17.0.6:9042 ! R:/172.17.0.5:42992]

Does somebody know what is going on here? Any help will be appreciated.

Ander
  • 5,093
  • 7
  • 41
  • 70

1 Answers1

2

Note that ping is not really TCP, so you might also want to verify TCP by using nc -nz <ip> 9042. However, since you are getting timeouts and not "Connection Refused" I will assume it's not connectivity.

The main thing you should check is whether your uWSGI config has any kind of monkey patching enabled (gevent, for example). The libevreactor used in the example uses the standard libraries and assumes no patching.

I think you will be able to resolve this by either disabling patching, or removing the explicit connection_class parameter, in which case the driver will detect patching and default the reactor implementation accordingly.

Adam Holmberg
  • 7,245
  • 3
  • 30
  • 53
  • Thanks for the help @Adam Holmberg, I really appreciate it. I have tried using `AsyncoreConnection` instead `LibevConnection` but still getting the same issue. I am not using any kind of monkey patching on my uWSGI (unless it comes activated by default on uWSGI version **2.0.11.2**). When I run `nodetool status` I can see that the 3 nodes are correctly displayed in the cluster. – Ander Sep 15 '16 at 21:39
  • 1
    Strange. This is almost always some combination of patching/reactors. Try printing `connection.cluster.connection_class` after setup. Is it correct? Can you make sure that socket.socket is socket._socketobject and not some patched ref? Turn on debug logging for the driver? Have you tried writing a script to connect outside uWSGI context? – Adam Holmberg Sep 16 '16 at 16:27
  • I have tried `connection.Cluster.connection_class` and looks like it is using **cassandra.io.libevreactor** even when I am passing the param **connection_class=AsyncoreConnection**! So I guess this param has been removed from the driver in some of the latest versions. From `socket.socket` I get `` . I have tried removing libev from my server to see if it fallsback to Asyncore, but not luck, it crashes: `ImportError: The C extension needed to use libev was not found.` – Ander Sep 19 '16 at 00:00
  • I have posted my uWSGI config. So I guess the main questions now is how can I make it run in Asyncore mode. Or even better, how can I make it work with libev given that uWSGI is not monkeypatching. Maybe uWSGI need some sort of plugin or should it work straight away? why this only happens if there is more than 1 node? Thanks for taking the time @Adam, this is driving me crazy. – Ander Sep 19 '16 at 00:59