26

Every tutorial I've seen about Multitenant database models tells you to put the TenantID in every single table:

zoos
-------
id
zoo_name
tenant_id

animals
-------
id
zoo_id
animal_name
tenant_id

However, this seems redundant to me. Why not add the tenant_id column to just the zoos table and exploit the foreign key relationship between zoos and animals?

Do you add tenant_id to every table just to keep the joins from getting too crazy? Is it a safeguard against bugs? A performance consideration?

Blixt
  • 49,547
  • 13
  • 120
  • 153
Mike Sickler
  • 33,662
  • 21
  • 64
  • 90
  • 1
    There are considerations when designing multitenant databases, read on an MSDN Article: [Multi-Tenant Data Architecture](http://msdn.microsoft.com/en-us/library/aa479086.aspx) In short there are several approaches, and it's a continuum of choices and consequences. Make an informed decision based on requirements. – Hendy Irawan Jan 11 '12 at 09:47

6 Answers6

16

If one of your key design considerations is security--specifically, one client can no way no how no when access another client's data--then, depending on how you implement this security, sticking that qualifying column in every table may be necessary. One such tactic described here requires building a view on every table; assuming each table contains a tenantId column, then if properly configured each view could contain a "WHERE tenantId = SUSER_SID()" clause (and of course you configure the database so that clients can only access the views).

Another factor (as in my current job) is loading warehouse data (ETL). Tables are partitioned on tenantId (we use table partitioning, but partitioned views would also work), and data can be easily loaded or unloaded for a client without seriously impacting any other client.

But as ever, there's a lot of "it depends" involved. If there is no clear and present need, and a very low likelihood of future need, then normalize that column out. Just realize that it's more a devise of physical implementation than of conceptual or logical database design.

Philip Kelley
  • 39,426
  • 11
  • 57
  • 92
12

Its there for convenience and performance - in terms of normalisation you're absolutely right, it only needs to go in at the top. The problem then becomes that to get to some data (say zoo -> animal -> food -> supplier) you have to have horribly complex joins on what are notionally very simple queries.

So in the real world one has to compromise - question then becomes where and to what extent.

See this article Maybe Normalizing Isn't Normal - and its conclusion:

As the old adage goes, normalize until it hurts, denormalize until it works

as a place to start exploring the subject

rmoestl
  • 3,059
  • 5
  • 24
  • 38
Murph
  • 9,985
  • 2
  • 26
  • 41
  • At the risk of starting a nerd religious war, I can't help but wonder if this isn't an example of where a natural key would be a useful fit. If you're multi-tenant app is partitioned by domain name, and you set the tenant_id to the domain name, you can reduce joins that way as well. – Paul Mar 17 '10 at 19:14
  • It won't reduce joins - you still have the problem of either including the key at multiple levels (notionally bad) or of complex joins (also notionally bad) and regardless natural keys almost always come back and bite you (-: – Murph Mar 18 '10 at 09:27
  • 1
    -1. If you normalize a multi-tenant, shared relation to either 3NF or 5NF, you'll end up with the tenant identifier in every table. – Mike Sherrill 'Cat Recall' Jan 11 '12 at 13:07
9

If I had tenantID at the top of the hierarchy (i.e. at the zoo level) you have several issues to consider.

  1. The top of the hierarchy can never change, for example if you need to add a node on the tree above the zoo level (say regions -> zoos -> animals) then it will force a re-org every time.
  2. For certain queries, you will be forced to start at the top of the hierarchy, i.e. give me a list of all animals available will force you to start at the top of the tree
  3. Why not use schemas ? Each tenant is isolated within their own schema. This will also separate the data-sets nicely.
blispr
  • 883
  • 5
  • 10
7

The first thing that springs to mind is that it's slower to look up animals > zoos > tenants than simply animals > tenants. And most likely this is a lookup you will do often (for example, "get all animals for a certain tenant, regardless of zoo").

For small to mid-sized applications you can get away with a more normalized structure, but for the sake of efficiency, you should go with extraneous data (and generally speaking, multitenancy applications are not small). Just make sure it doesn't go "out of sync", which is a risk that comes with having redundant data.

To answer your last paragraph, the reason is performance, pure and simple. Joins are no bad thing; they help you keep a piece of data in one place rather than three. It's definitely not to prevent bugs. Adding a tenant_id field to more tables will increase the risk of bugs (although for an id that never changes, it wouldn't be as much of an issue).

Blixt
  • 49,547
  • 13
  • 120
  • 153
  • 1
    -1. Storing the tenant id only int the zoo table isn't "more normalized". Storing it in every table as a foreign key isn't extraneous; that's what you're *supposed* to do with foreign keys. – Mike Sherrill 'Cat Recall' Jan 11 '12 at 13:03
  • Yes it is. The more normalized a database is, the less redundancy it has. I think you'll agree that storing an id in multiple places is redundancy. I never said the fields would be first-class foreign keys that handle data consistency, but even if they were that would reduce performance due to extra checks. The question was to explain a reason for why the field was added to multiple tables, which I think I did. So I'd say your -1 is unwarranted. – Blixt Jan 11 '12 at 16:36
  • 1
    Foreign keys aren't redundant. They're a central and distinguishing feature of the relational model. In a multi-tenant, shared schema architecture, all the foreign keys are compound keys composed of tenant_id plus something else. That tenant id is the only thing that distinguishes one tenant's rows from every other tenant's rows. Leave the tenant id out of a table, and you also have to disallow inserts, updates, and deletes. (Think about it for a minute.) – Mike Sherrill 'Cat Recall' Jan 13 '12 at 03:06
  • 1
    I mean redundant in the "duplicate data" sense, not the "unnecessary" sense. Foreign keys have a useful function, I agree. I don't agree that they are *necessary* for the data model, however. If an animal belongs to a zoo that in turn belongs to a tenant, then all the data is there to infer that the animal belongs to that tenant. I think you misunderstood me, it seems that you think I was saying that the tenant id field shouldn't exist at all. I was just saying that it only needs to exist twice (tenant table and zoo table.) Having it in animals would be redundant but more efficient for reads. – Blixt Jan 13 '12 at 21:25
  • And for the tenant id being the only distinguishing thing between tenants, that depends on how you structure your data. You seem to make the assumption that id generation is completely isolated per tenant, and if that is the case, then yes you do need to make a composite with the tenant id. But whether ids are isolated per tenant or global wasn't specified in the question, so I still don't see why you thinking it should be isolated warrants a -1 to the answers here. – Blixt Jan 13 '12 at 21:34
  • When you have a "chain" of tables, normalization won't tell you to store one foreign key in the next table, a different foreign key in the table after that, yet another foreign key in the table after that, and so on. In fact, storing FKs that way is essentially an implementation of IMS in SQL--IMS is one of the problems the relational model solved. In addition, a multi-tenant database is essentially 'n' independent databases implemented in a single SQL database (where 'n' is the number of tenants), and storing the tenant id in every table is the only thing that keeps foreign keys working. – Mike Sherrill 'Cat Recall' Jan 27 '12 at 23:17
  • 1
    TenantId as a "foreign key" does not seems like redundant data. – Denis Kucherov Feb 10 '18 at 18:48
  • 2
    I think there's just a language issue here. Any value that can be inferred is "redundant", aka "denormalized", data. It's usually kept for performance reasons. In this case the `tenant_id` is *denormalized* or *redundant* because it is stored twice, and a data inconsistency/corruption would occur if you accidentally change one and not the other. A fully normalized database does not allow this. Please read: https://en.wikipedia.org/wiki/Database_normalization – Blixt Feb 11 '18 at 15:07
0

Well, Bob may own a giraffe in zoo No1, while Joe may own a lion in the same zoo. They are not supposed to look at each others' data.

Damir Sudarevic
  • 21,891
  • 3
  • 47
  • 71
-2

The reason N1 is for security.

Security need to be a strong concept in multi-tenant application.

Suppose that you give a user the ability to modify a Animal. You create a form whith a select that show the zoo fo the current tenant. What happen if the user hack the form and pass a zoo id of another tenant?

The animal will be moved to another zoo of another tenant!!

This is real pain in a multi tenant app!

Marco Staffoli
  • 2,475
  • 2
  • 27
  • 29
  • 1
    You could do checks on the server side to make sure the user has rights to that ID. You would need to do that regardless of the db design, but I get your point. – Mike Sickler Nov 12 '10 at 19:27
  • 3
    What happens if the user hacks the form and supplies id of an animal that belong to another tenant? What happens if the user hacks the "my profile" page to supply the id of the super-admin-account? You simply cannot trust the user's input - that applies to any system where users have different permissions, not just multi-tenancy. – Oskar Berggren Dec 03 '12 at 19:15
  • 1
    This is mitigated in the authentication and the client has no knowledge of the tenant_id. The tenant_id is only apparent in the server side. So in the case that a user manually passes a tenant_id, it is useless since tenant_id is not retrieved from the client rather obtained from an app-state variable in the server logic. – Dohd Oct 24 '20 at 21:22