0

I've inherited a well established flask-based API service that makes extensive use of mongoengine. We are making this single-database API into a multi-tenant service, and unclear on the best practice. For many reasons, tenant data will be physically segregated into different databases, one db per tenant. (There is also a connection to a 'core' db for some Documents.)

The use case workflow is simple:

  • receive a request
  • validate the API access token and reconcile a user/tenant
  • switch the database connection to the correct tenant db
  • do the Document operations

It seems the best way to implement this is to use aliases, but when I context switch I need to disconnect('tenant_db') then connect(alias='tenant_db'). This feels wrong.

Regardless, the code actually works, but now I have a unit test issue. When testing, and using the (per documentation) mongomock://localhost connection, the code actually times out trying to connect to a real mongodb running on localhost. I suspect all this has something to do with the mock connection not having the appropriate scope, but I can't find much documentation about testing using a mock db.

Sorry for two questions in one:

  1. is the disconnect then re-connect alias pattern the correct approach
  2. are there better practices (or clearer examples) of pytest+mongoengine+mongomock

Not a pro with pytest but also not a novice. Solid with pymongo but brand new to mongoengine.

Thanks!

hikaru
  • 2,444
  • 4
  • 22
  • 29

1 Answers1

1

A little disappointed to not get any responses from the mongoengine community - perhaps it's not as active as I'd hoped.

Here are the results of further investigation, and the approach we decided on.

First spike - a single tenant_db alias, and on each request change the underlying connection. Pros: this allowed the existing code to work without much refactoring. Cons: changing the details of a registered connection is impossible without disconnect and reconnect, and tripping on warnings about default databases (There is no "default", this is per request, but a "default" is required.) This was messy and did not succeed.

Second spike - bite the bullet and refactor every single Document instantiation to be inside the with switch_db(alias) pattern, using a unique alias for every tenant. Pros: the explicit nature of this gives more confidence the document operation will happen in the right database. Pro #2, the incessant warnings about "default" database actually works to our advantage - any rogue Document operations not inside a switch_db context will throw an error. Cons: still requires disconnect and reconnect before the with context.

Third spike - we considered forking mongoengine, and altering the Document class to be more pure - allowing injection of the db connection instead of relying on an external stack of registered connections. Pros: we'd be in full control of the db connection logic. Cons: db stuff is deep in the DNA of mongoengine - this looks unlikely to succeed without significant effort.

We chose Spike Two.

Since gunicorn->wsgi->flask gives us reliable _per-request isolation, and the new "default" connection for the tenant happens after request auth success, then leveraging with switch_db(alias) works. This allows us a short term fix to make this single-tenant codebase operate in a multi-tenant manner.

We are also not 100% confident we understand how the underlying pymongo will connection pool for performance. More studying to be done there.

Finally, regarding the pytest confusion. In Spike One, testing with mocks was impossible due to fixtures, scopes, disconnect/reconnect, etc. Spike Two works better, except we had to add an environment variable to the actual code so it won't try to make real connections if we are in "unit_test_mode".

hikaru
  • 2,444
  • 4
  • 22
  • 29