Best approach for multi-tenant primary keys

Question

I have a database used by several clients. I don't really want surrogate incremental key values to bleed between clients. I want the numbering to start from 1 and be client specific.

I'll use a two-part composite key of the tenant_id as well as an incremental id.

What is the best way to create an incremental key per tenant?

I am using SQL Server Azure. I'm concerned about locking tables, duplicate keys, etc. I'd typically set the primary key to IDENTITY and move on.

Thanks

"I don't really want surrogate incremental key values to bleed between clients..." I'm curious, why do you not want this to happen? — Michael Fredrickson, Sep 16 '12 at 21:02
Mainly because it will give people an idea of how many rows I'm handling, in my Utopian world they should be completely walled off from one another — Paul Deen, Sep 16 '12 at 21:10
In fact you can have both - surrogate primary key for referential integrity (hidden from customers) and two-column unique indices , e.g: (Cust_id, Order_Number) — alexm, Sep 16 '12 at 22:23
Very true. I'll ponder that some more. Not sure if it feels entirely right, but definitely an idea. Thanks — Paul Deen, Sep 16 '12 at 22:55
You could also use a [GUID](http://en.wikipedia.org/wiki/Globally_unique_identifier)/[UUID](http://en.wikipedia.org/wiki/Universally_unique_identifier) and not have any implicit meaning behind the keys. — Glenn, Sep 16 '12 at 23:07
If the keys are segmented by tenant they're no longer surrogate keys, and you wouldn't have any real need for the tenant_id column either. If it's a surrogate key, it should probably never be shown to the end user. — Larry Lustig, Oct 22 '12 at 17:42

score 2 · Accepted Answer · edited May 23 '17 at 12:10

Are you planning on using SQL Azure Federations in the future? If so, the current version of SQL Azure Federations does not support the use of IDENTITY as part of a clustered index. See this What alternatives exist to using guid as clustered index on tables in SQL Azure (Federations) for more details.

If you haven't looked at Federations yet, you might want to check it out as it provides an interesting way to both shard the database and for tenant isolation within the database.

Depending upon your end goal, using Federations you might be able to use a GUID as the primary clustered index on the table and also use an incremental INT IDENTITY field on the table. This INT IDENTITY field could be shown to end-users. If you are federating on the TenantID each "Tenant table" effectively becomes a silo (as I understand it at least) so the use of IDENTITY on a field within that table would effectively be an ever increasing auto generated value which increments within a given Tenant.

When \ if data is merged together (combining data from multiple Tenants) you would wind up with collisions on this INT IDENTITY field (hence why IDENTITY isn't supported as a primary key in federations) but as long as you aren't using this field as a unique identifier within the system at large you should be ok.

score 1 · Answer 2 · answered Dec 18 '12 at 22:05

If you're looking to duplicate the convenience of having an automatically assigned unique INT key upon insert, you could add an INSTEAD OF INSERT trigger that uses MAX of the existing column +1 to determine the next value.

If the column with the identity value is the first key in an index, the MAX query will be a simple index seek, very efficient.

Transactions will ensure that unique values are assigned but this approach will have different locking semantics than the standard identity column. IIRC, SQL Server can allocate a different identity value for each transaction that requests it in parallel and if a transaction is rolled back, the value(s) allocated to it are discarded. The MAX approach would only allow one transaction to insert rows into the table at a time.

A related approach could be to have a dedicated key value table keyed by the table name, tenant ID and current identity value. It would require the same INSTEAD OF INSERT trigger and more boilerplate to query and keep that key table updated. It wouldn't improve parallel operations though; the lock would just be on a different table's record.

One possibility to fix the locking bottleneck would be to include the current SPID in the key's value (now the identity key is a combination of sequential int and whatever SPID happened to allocate it and not simply sequential), use the dedicated identity value table and insert records there per SPID as necessary; the identity table PK would be (table name, tenant, SPID) and have a non-key column with the current sequential value. That way, each SPID would have its own dynamically allocated identity pool and would only ever have its own SPID specific records locked.

Another downside is maintaining triggers that have to be updated whenever you change the columns in any of the special identity tables.

Best approach for multi-tenant primary keys

2 Answers2

Linked