How does one Azure table storage table with many partition keys compare to many tables with fewer partition keys?

Question

I have a Windows Azure application in which all read queries of TableA are executed on single partitions for a range of rowkeys. The Partition Keys that facilitate this storage scheme are actually flattened names of objects in a hierarchy, such that the Partition Key is formatted like {root}_{child1}_{child2}_{leaf}. I can understand how it might be beneficial to divide this one big TableA into many tables by using the root dimension of the Partition Keys in the naming of the Tables (so the Partition Key would become {child1}_{child2}_{leaf}).

What I want to do is provide as rapid access to this data as I can from as many connections at the same time as possible. It would also be incredible if I could figure out what these limits are or should be.

More specific questions about my proposed change:

Will this make a difference in scalability, i.e. the number of simultaneous data access requests that can be served without perfecting performance dramatically? Served at the same time at all?
Will this make a difference in average performance? Potential performance?

Please post some sample TPL and async queries – paparazzo Jul 05 '12 at 22:03 — paparazzo, Jul 05 '12 at 22:03

score 12 · Accepted Answer · answered Jun 12 '11 at 07:54

12

If every query specifies a partition key, it makes no difference how many tables those partitions are spread across. In other words, the following are equivalent: one table with a thousand partitions versus a thousand tables each with one partition.

The main reason I can think of to consider splitting out into multiple tables is that you can delete an entire table in a single operation/transaction, while you can't to that with a range of partitions within the same table. That means for things like logs, where you may want to delete the older ones after a while, it's often better to have different tables for different time ranges.

answered Jun 12 '11 at 07:54

user94559

59,196
6
103
103

Interesting, so that I understand then, the IO limitation for concurrent worker roles querying table storage is at the account level? – user483679 Jun 12 '11 at 15:57
1

There are limits on operations per second at the partition level (table+partition) and at the account level. – user94559 Jun 13 '11 at 19:36

score 6 · Answer 2 · answered Jun 12 '11 at 10:42

6

+1 for Steve's answer.

Some things to add

it might be worth considering using multiple storage accounts - since it's currently the storage account that is the unit of scability - each storage account is officially targeted to about 5000 entity/transactions per second so if you want higher than that then you need to use multiple accounts.
there are some delicate details in performance about how you query your data - if items are not in the same partition then its generally quicker to perform separate parallel queries instead of performing a single query with a complicated where parameter.
you may find the blog posts on the storage team blog particularly helpful - http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx and http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx
you may also need to be aware of the costs - roughly $1 per million hits.

answered Jun 12 '11 at 10:42

Stuart

66,722
7
114
165

Yes, excellent, thank you for your insights. I had come upon separate parallel queries (one per partition) through my testing, but it's great to know that's actually the right approach. TPL and async queries seem to work well. I will look into multiple accounts. The problem is that I can only have so many accounts, right? It is not immediately clear to me how to logically divide my application into 5 or so pieces that will probably scale. – user483679 Jun 12 '11 at 16:13
To add...It would actually be really beneficial to me for billing purposes if I could create as many table storage accounts as I needed to. A high-level storage account partition that makes sense for the kind of projects we want to do would – user483679 Jun 12 '11 at 16:17
To add... It would actually be really beneficial to me for billing purposes if I could create as many table storage accounts as I needed to. A high-level storage account partition that makes sense for the kind of projects we want to do would be at the client level. If we could assign a unique table storage account to every client, then we would probably achieve our IO scalability goals and effectively use your billing system as a piece of our own. – user483679 Jun 12 '11 at 16:36
Just so we're clear... it's not my billing system :) And I think that you can have more than 5 storage accounts - but you'd have to ask Microsoft about this. – Stuart Jun 12 '11 at 16:42
You can definitely ask MS for more storage accounts, but they seem to draw the line at about 20. – knightpfhor Jun 12 '11 at 20:28

How does one Azure table storage table with many partition keys compare to many tables with fewer partition keys?

2 Answers2