18

Consider a mobile device manager system that contains information for every user such as a table that stores the apps that he has installed on the phone, auditing details, notification information etc. Is it wise to create a seperate schema for each user with the corresponding tables? The number of tables is large for a single user amounting to about 30 tables each. Would it be better to have a seperate schema where all this information is placed into these tables (in turn creating enormous tables?) or have a schema for each user?

Thanks in advance

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
michelle
  • 2,759
  • 4
  • 31
  • 46
  • maybe I do not quite understand this, but why can't you have a `UserID` in each table? – Damir Sudarevic Dec 01 '11 at 12:41
  • yes I can but I want to see which method is more efficient in terms of querying in the database. i.e. whether to add a userID but in turn have large tables because of the large amounts of users or whether each user should have a seperate schema – michelle Dec 01 '11 at 12:52
  • 1
    mishelle- I think all major systems (which have a multi-user model) follow the approach Damir suggested (using a UserId). The "schema for each user" approach will just not scale. Imagine having 50 tables in your database and 10 users trying to create accounts. You'll have to create 500 tables on a running system! – Rajesh Chamarthi Dec 01 '11 at 13:46
  • @RajeshChamarthi: I've worked on running systems that had tens of thousands of tables in production. It's harder on the people than it is on the server. (If you've bought a really good server.) – Mike Sherrill 'Cat Recall' Dec 01 '11 at 14:45

1 Answers1

39

I want to see which method is more efficient in terms of querying in the database.

In a multi-tenant database, querying is only part of the problem. Other parts of the problem are cost, data isolation and protection, maintenance, and disaster recovery. These are significant; you can't consider only query efficiency in a multi-tenant database.

Multi-tenant solutions range from one database per tenant (shared nothing) to one row per tenant (shared everything).

"Shared nothing", "separate database", or one database per tenant

  • Most expensive per client. (Large numbers of clients imply large numbers of servers.)
  • Highest degree of data isolation.
  • Disaster recovery for a single tenant is simple and straightforward.
  • Maintenance is theoretically harder, because changes need to be carried out in every database. But your dbms might easily support running stored procedures in each database. (SQL Server has an undocumented system stored procedure, sp_msforeachdb, for example. You can probably write your own.)
  • "Shared nothing" is the most easily customizable, too, but that also raises more maintenance issues. For example, each tenant might have a different pattern of usage, which suggests each tenant might need some indexes that other tenants wouldn't. That's trivial to do with "shared nothing"; impossible with "shared everything" (below).
  • Lowest number of rows per table. Querying speed is near optimal.

"Shared everything", or "shared schema", or "one database per planet"

  • Least expensive per tenant.
  • Lowest degree of data isolation. Every table has a column that identifies which tenant a row belongs to. Since tenant rows are mixed in every table, it's relatively simple to accidentally expose other tenant's data.
  • Disaster recovery for a single tenant is relatively complicated; you have to restore individual rows in many tables. On the other hand, a single-tenant disaster is relatively unusual. Most disasters will probably affect all tenants.
  • Structural maintenance is simpler, given that all tenants share the tables. It increases the communication load, though, because you have to communicate and coordinate each change with every tenant. It's not easily customizable.
  • Highest number of rows per table. Quick querying is harder, but it depends on how many tenants and how many rows. You could easily tip over into VLDB territory.

Between "shared nothing" and "shared everything" is "shared schema".

"Shared schema"

  • Tenants share a database, but each tenant has it's own named schema. Cost falls between "shared nothing" and "shared everything"; big systems typically need fewer servers than "shared nothing", more servers than "shared everything".
  • Much better isolation than "shared everything". Not quite as much isolation as "shared nothing". (You can GRANT and REVOKE permissions on schemas.)
  • Disaster recovery for a single tenant require restoring one of many schemas. This is either relatively easy or fairly hard, depending on your dbms.
  • Maintenance is easier than "shared nothing"; not as easy as "shared everything". It's relatively simple to write a stored procedure that will execute in each schema in a database. It's easier to share common tables among tenants than with "shared nothing".
  • Usually more active tenants per server than "shared nothing", which means they share (degrade) more resources. But not as bad as "shared everything".

Microsoft has a good article on multi-tenant architecture with more details. (The link is to just one page of a multi-page document. Microsoft has since removed that page; the link is now to a copy in archive.org.)

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • 1
    Shared nothing isn't necessarily fastest real-world, because you wind up having to use *much* cheaper database servers (or more likely virtual machines). One client per $1k database box may very well be slower than 5 clients on a $5k database box—or even a $3k database box. – derobert Dec 01 '11 at 14:49
  • "Shared nothing" doesn't restrict you to $1000 servers. And "shared nothing" doesn't mean one computer per tenant; it means one database per tenant. You're right in implying that stupid hardware decisions can cripple any design. But that's an answer to a different question. – Mike Sherrill 'Cat Recall' May 04 '12 at 22:38
  • Well, first of all, you have a very weird definition of ["shared nothing"](http://en.wikipedia.org/wiki/Shared_nothing) in which both disk and memory are shared. Its also a definition which makes your "shared nothing" basically the same as your "shared schema" (which I guess you really mean separate schema). [SQL Server has some weirdness around backing up/restoring schemas that makes them different, but no other common DB does] And "one computer per tenant" may or may not be a stupid design, it has definite advantages (e.g., one customer generating a ton of load doesn't effect anyone else). – derobert May 05 '12 at 00:50
  • 2
    "Shared nothing" in multi-tenant database architecture doesn't mean the same as "shared nothing" in distributed computing architecture. It's different from a shared schema, regardless of which meaning you pick. "Shared schema" can mean--1) one schema per tenant (multiple tenants per database, each tenant has its own schema and therefore its own tables), or 2) all tenants share every table (multiple tenants per database, all tenants in one schema, for which I prefer the term "shared everything"). Multi-tenant database architecture is independent of hardware and network architecture. – Mike Sherrill 'Cat Recall' May 05 '12 at 01:09