1

We're in the process of rewriting a web application in Java, coming from PHP. I think, but I'm not really sure, that we might run into problems in regard to connection pooling. The application in itself is multitenant, and is a combination of "Separate database" and "Separate schema".

For every Postgres database server instance, there can be more than 1 database (named schemax_XXX) holding more than 1 schema (where the schema is a tenant). On signup, one of two things can happen:

  1. A new tenant schema is created in the highest numbered schema_XXX database.
  2. The signup process sees that a database has been fully allocated and creates a new schemas_XXX+1 database. In this new database, the tenant schema is created.

All tenants are known via a central registry (also a Postgres database). When a session is established the registry will resolve the host, database and schema of the tenant and a database session is established for that HTTP request.

Now, the problem I think I'm seeing here is twofold:

  1. A JDBC connection pool is defined when the application starts. With that I mean that all databases (host+database) are known at startup. This conflicts with the signup process.
  2. When I'm writing this we have ~20 database servers with ~1000 databases (for a total sum of ~100k (tenant) schemas. Given those numbers, I would need 20*1000 data sources for every instance of the application. I'm assuming that all pools are also, at one time or another, also started. I'm not sure how much resources a pool allocates, but it must be a non trivial amount for 20k pools.

So, is it feasable to even assume that a connection pool can be used for this?

For the first problem, I guess that a pool with support for JMX can be used, and that we create a new datasource when and if a new schemas_XXX database is created. The larger issue is that of the huge amount of pools. For this, I guess, some sort of pool manager should be used that can terminate a pool that have no open connections (and on demand also start a pool). I have not found anything that supports this.

What options do I have? Or should I just bite the bullet and fall back to an out of process connection pool such as PgBouncer and establish a plain JDBC connection per request, similar to how we're handling it now with PHP?

c0dem4gnetic
  • 932
  • 1
  • 7
  • 24

1 Answers1

1

A few things:

  1. A Connection pool need not be instantiated only at application start-up. You can create or destroy them whenever you want;
  2. You obviously don't want to eagerly create one Connection pool per database or schema to be open at all times. You'd need to keep at least 20000 or 100000 Connections open if you did, a nonstarter even before you get to the non-Connection resources used by the DataSource;
  3. If, as is likely, requests for Connections for a particular tenant tend to cluster, you might consider lazily, dynamically instantiating pools, and destroying them after some timeout if they've not handled a request for a while.

Good luck!

Steve Waldman
  • 13,689
  • 1
  • 35
  • 45