Post-fork connection to Cassandra under gunicorn

Question

I'm developing a Flask microservice to expose some data from a Cassandra database. It is served with gunicorn being called from the command line. I'm confused regarding as to where and how to connect to the database under gunicorn, particularly when considering how to mock or bypass the need of the database when doing unit testing.

My first attempt was to connect at app creation time, as follows:

def create_app():
    app = Flask(__name__)

    app.debug = True

    cluster = Cluster([os.environ['CASSANDRA_HOST']])
    app.cassandra = cluster.connect(os.environ['CASSANDRA_KEYSPACE'])

    return app

thinking that it will make a single "global" connection reducing overhead for each individual request. This clearly fails unit testing as it'll try to connect which is not the point of a unit testing in an isolated environment of a CI pipeline.

Then, checking some slides from Cassandra people it says that for Flask under gunicorn we should use @app.before_first_request and that as a general rule should connect "post-fork". I'm not sure what "post-fork" means in this context.

Anyhow, doing:

@app.before_first_request
def before_request():
    app.cluster = Cluster([os.environ['CASSANDRA_HOST']])
    app.cassandra = app.cluster.connect(os.environ['CASSANDRA_KEYSPACE'])

also works, but still have the same problem with the isolated unit testing.

By reading through this post, I guess that I'm not having issues with the connection itself as I'm creating one Cassandra session per Flask instance that is spawned by gunicorn post-fork.

Then my question would reduce to which strategy to follow to be able to unit test the endpoints without having to explicitly reach for the database.

score 1 · Answer 1 · answered Nov 07 '17 at 22:03

The general idea is to have one Cluster/Session per process, created on fork and held for the process lifetime. Most servers offer a post-fork 'hook' for setting up resources like this.

The Gunicorn hook is documented here

uWSGI analog is reference in the driver FAQ.

Post-fork connection to Cassandra under gunicorn

1 Answers1