17

We're creating a multi-tenant application that must segregate data between tenants. Each tenant will save various documents, each of which can fall into several different document categories. We plan to use Azure blob storage for these documents. However, given our user base and the number of documents and size of each one, we're not sure how to best manage storage accounts with our current Azure subscription.

Here are some numbers to consider. With 5,000 users at 27,000 8Mb documents per year per user, that is 1080TB per year total. A storage container maxes out at 500TB per storage account.

So my question is what would be the most efficient and cost effective way to store this data and stay within the Azure limits?

Here are a few things we've considered:

  1. Create a storage account for each client. THIS DOES NOT WORK because you can only have 100 storage accounts per subscription (this would have been the most ideal solution).

  2. Create a blob container for each client. A storage account can have up to 500TB, so this could potentially work except for eventually we would have to split off into other storage accounts. I'm not sure how that would work if eventually a user had data in two accounts. Could get messy.

Perhaps we are missing something fundamentally simple here.

UPDATE For now our thought is to use Azure table storage with a table for each document type. Within each table the partition key would be the tenant's ID, and the row key would be the document ID. Each row would also contain metadata type information for the document, along with a URI (or something) linking to the blob itself.

spoof3r
  • 607
  • 8
  • 23
  • Will you be storing the client/files relationship in some kind of table? For example, a master table which would store the list of all files for all clients? – Gaurav Mantri Mar 22 '15 at 03:52
  • @Gaurav Mantri: Great question! I have provided an update to address your question. – spoof3r Mar 22 '15 at 04:00

2 Answers2

15

Not really an answer but think of it as "food for thought" :). Basically your architecture should be based on the fact that each storage account has some scalability targets and your design should be such that you don't exceed those to maintain high availability of storage for your application.

Some recommendations:

  • Start by creating multiple storage accounts (say 10 to begin with). Let's call them Pods.
  • Each tenant will get one of the pod. You can pick a pod storage account randomly or use some predefined logic. The information about the pod is stored along side tenant information.
  • From the description it seems that currently you're storing the file information in just one table. This would put a lot of stress on just one table/storage account which is not a scalable design IMHO. Instead when a tenant is created, you assign a pod to the tenant and then create a table for each tenant which will store the file information in that table. This would have following benefits: 1) You have nicely isolated each tenant data, 2) The read requests are now load-balanced thus allowing you to stay within scalability targets and 3) Since each tenant data lies in a separate table, your PartitionKey became free and you can assign some other value if needed.

Now coming on to storing files:

  • Again you can go with the Pod concept wherein files for each tenant reside in the pod storage account for that tenant.
  • If you see issues with this approach, you can randomly pick the pod storage account and put the file there and store the blob URL in the Files table.
  • You could either go with just one blob container (say tenant-files) or separate blob containers for each tenant.
  • With just one blob container for all tenants, management overhead is smaller as you just have to create this container when a new pod is commissioned. However the downside is that you can't logically separate files by tenant so if you want to provide direct access to the files (using Shared Access Signature), it would be problematic.
  • With separate blob containers for each tenant, the management overhead is more but you get nice logical isolation. In this as a tenant is brought on board, you would have to create container for that tenant in each pod storage account. Similarly when a new pod is commissioned, you have to ensure that a blob container is created for each tenant in the system.

Hope this gives you some idea about how you can go about architecting your solution. We're using some of these concepts in our solution (which explicitly uses Azure Storage as data store). It would be really interesting to see what architecture you come up with.

Gaurav Mantri
  • 128,066
  • 12
  • 206
  • 241
  • 1
    I'm using the same approach: table for tenant allows easy 'per tenant' migration. Also, if one tenant becomes very noisy you can move it to a separate account by changing pod settings. If you store subscription in pod settings you can bust the storage accounts per subscription limit by using multiple subscriptions. – Jakub Konecki Mar 22 '15 at 08:10
  • I like the idea of one table per tenant. How could I make this work if I have two entity types that documents can be associated with? Right now a tenant can have documents associated with a 'session' or a 'location'. I could have sessionId = 1 and a locationId = 1, which means I can't have the partition key as '1' because it could reference a session or a location. Would I just prepend the entity name to the entity's ID and use that as the partition key? Ex: tenant A has its own table, and the partition key 'session1' refers to session ID 1, and 'location1' refers to location ID 1? – spoof3r Mar 22 '15 at 14:34
  • Without knowing your application requirement in details, I think it would be hard for me to comment on this aspect. But one thing you need to keep in mind is how are you going to query this table. Based on that you may end up creating 2 separate tables per tenant. You may find the following links helpful: http://stackoverflow.com/questions/15809078/design-of-partitioning-for-azure-table-storage and http://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/. HTH. – Gaurav Mantri Mar 22 '15 at 14:42
  • I'm trying to understand the last point in your answer where you mention I could use separate blob containers for each tenant. You said when a new tenant is brought on board, I would need to create a container for that tenant (which makes sense), for EACH POD STORAGE ACCOUNT. This last part is confusing me. Why would I need to create a blob container for this tenant in each of my "pods" (storage accounts)? For new clients wouldn't I just need to pick a pod and then create the necessary tables and blob container within that single pod?? – spoof3r Mar 22 '15 at 20:07
  • 1
    Sorry I was not clear. So with pod approach and storing files you could go 2 ways: 1) Have a dedicated pod for each tenant for storing files. In this case, you would create a blob container in tenant's pod storage account (when tenant is commissioned) and all files for a tenant would be stored there. 2) Alternate approach would be to pick a random pod storage account when it comes to saving file and store the blob URL in tenant table. In this case, one file for a tenant could go in one pod while other file could go in another one. My comment was for 2nd scenario. HTH. – Gaurav Mantri Mar 23 '15 at 03:15
  • So one issue I am having is I have a tenant per container, this does not work if you want to use azure functions since you have to give a path of a container, I don't think you can do the containers dynamically in the azure functions triggers. – Jonathan Feb 14 '23 at 20:56
9

I am just going to put my thought on the topic, and it do have some redundant information to Gaurav Mantri's answer. This is based on a design that I came up with after doing something very similar at my current work.

Azure Blob storage

  1. Randomly select a pod from pod pool when tenant is created and store its namespace along with the tenant information.

  2. Provide an api for creating containers where container names are composite of tenant id Guid::ToString("N") + <resourcename>. You dont need to sell the to your users as containers, i can be folders, worksets or filebox, you find a name.

  3. Provide an api for maintaining documents within these containers.

This means that you can just increase the pod pool if getting more tenants, ect remove those pods that is getting filled up.

The benefits of this is that you do not need to keep two systems for your data, using both table storage and blob storage. Blob storage already have a way to present data as a directory/files hierarchy.

Extension Points

Blob Storage Api Broker

On top of the above design I made an Owin Middleware that wraps in between clients and blob storage, basicly just forwarding requests from clients to blob storage. This step is off cause not needed, as you can delegate normal sas tokens and talk directly to blob storage from clients. But it makes it easy to hook into when actions are executed on files. Each tenant will get its own endpoint files/teantid/<resourcename>/

Using such an API would also enable you to hook into whatever token authentication system you may be useing already to validate the authenticate and authorize the incoming requests and then sign the requests in this API.

Blob Storage Metadata

Using the above api broker extension, combined with metadata one can actually take it a step further and modify incoming requests to always include metadata and add in filters on the xml returned to blob storage before sending it to clients to filter out containers or blobs. One example would be when users delete a blob, then set a x-ms-meta-status:deleted and filter them out when returning blobs/containers. This way you can add different procedures for deleting data behind the scenes.

One should be careful here, since you don't want to put to much logic in here since it adds a penalty on all requests, but doing it smart can make this work very nice.

This extensions would also allow you to allow your users to create "empty" subfolders inside a container, but placing a zero byte file with a status:hidden that also will be filtered out. (remember that blob storage only can show virtual folders if there is something in them). This could also be achieved using table storage.

Azure Search

Another great extension point is that for each blob you could keep it in Azure Search to be able to find content, and this is most likely my favorite. I dont see any good solution using just blob storage or table storage that could give you a good search functionality or to some extend even a good filtering experience. With Azure Search this would give users a really rich experience for finding their content again.

Snapshots

Another extension is that snapshots could be created for every time a file is modified automatically. This becomes even easier with the broker api, otherwise monitoring logs is an options.

These ideas comes from a project that I started that I wanted to share, but since I am busy the coming months at work I don't see myself releasing my project before the summer holidays give me time to finish. The motivation of the project is to provide a nuget package that enables other developers to quickly set up this broker api that i mentioned above and configure a multi tenant blob storage solution.

I kindly ask you to vote up this answer if you read this and believe such a project could have saved you time in your current development process. This way I can see if I could use more time on the project or not.

I think that gaurav Mantris answer is more spot on for the question above, but just wanted to share my ideas on the topic.

Community
  • 1
  • 1
Poul K. Sørensen
  • 16,950
  • 21
  • 126
  • 283