I'm developing a Flask microservice to expose some data from a Cassandra database. It is served with gunicorn
being called from the command line. I'm confused regarding as to where and how to connect to the database under gunicorn
, particularly when considering how to mock or bypass the need of the database when doing unit testing.
My first attempt was to connect at app creation time, as follows:
def create_app():
app = Flask(__name__)
app.debug = True
cluster = Cluster([os.environ['CASSANDRA_HOST']])
app.cassandra = cluster.connect(os.environ['CASSANDRA_KEYSPACE'])
return app
thinking that it will make a single "global" connection reducing overhead for each individual request. This clearly fails unit testing as it'll try to connect which is not the point of a unit testing in an isolated environment of a CI pipeline.
Then, checking some slides from Cassandra people it says that for Flask under gunicorn
we should use @app.before_first_request
and that as a general rule should connect "post-fork". I'm not sure what "post-fork" means in this context.
Anyhow, doing:
@app.before_first_request
def before_request():
app.cluster = Cluster([os.environ['CASSANDRA_HOST']])
app.cassandra = app.cluster.connect(os.environ['CASSANDRA_KEYSPACE'])
also works, but still have the same problem with the isolated unit testing.
By reading through this post, I guess that I'm not having issues with the connection itself as I'm creating one Cassandra session per Flask instance that is spawned by gunicorn
post-fork.
Then my question would reduce to which strategy to follow to be able to unit test the endpoints without having to explicitly reach for the database.