10

I have a Windows Azure application in which all read queries of TableA are executed on single partitions for a range of rowkeys. The Partition Keys that facilitate this storage scheme are actually flattened names of objects in a hierarchy, such that the Partition Key is formatted like {root}_{child1}_{child2}_{leaf}. I can understand how it might be beneficial to divide this one big TableA into many tables by using the root dimension of the Partition Keys in the naming of the Tables (so the Partition Key would become {child1}_{child2}_{leaf}).

What I want to do is provide as rapid access to this data as I can from as many connections at the same time as possible. It would also be incredible if I could figure out what these limits are or should be.

More specific questions about my proposed change:

  1. Will this make a difference in scalability, i.e. the number of simultaneous data access requests that can be served without perfecting performance dramatically? Served at the same time at all?
  2. Will this make a difference in average performance? Potential performance?
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user483679
  • 665
  • 1
  • 7
  • 21

2 Answers2

12

If every query specifies a partition key, it makes no difference how many tables those partitions are spread across. In other words, the following are equivalent: one table with a thousand partitions versus a thousand tables each with one partition.

The main reason I can think of to consider splitting out into multiple tables is that you can delete an entire table in a single operation/transaction, while you can't to that with a range of partitions within the same table. That means for things like logs, where you may want to delete the older ones after a while, it's often better to have different tables for different time ranges.

user94559
  • 59,196
  • 6
  • 103
  • 103
  • Interesting, so that I understand then, the IO limitation for concurrent worker roles querying table storage is at the account level? – user483679 Jun 12 '11 at 15:57
  • 1
    There are limits on operations per second at the partition level (table+partition) and at the account level. – user94559 Jun 13 '11 at 19:36
6

+1 for Steve's answer.

Some things to add

Stuart
  • 66,722
  • 7
  • 114
  • 165
  • Yes, excellent, thank you for your insights. I had come upon separate parallel queries (one per partition) through my testing, but it's great to know that's actually the right approach. TPL and async queries seem to work well. I will look into multiple accounts. The problem is that I can only have so many accounts, right? It is not immediately clear to me how to logically divide my application into 5 or so pieces that will probably scale. – user483679 Jun 12 '11 at 16:13
  • To add...It would actually be really beneficial to me for billing purposes if I could create as many table storage accounts as I needed to. A high-level storage account partition that makes sense for the kind of projects we want to do would – user483679 Jun 12 '11 at 16:17
  • To add... It would actually be really beneficial to me for billing purposes if I could create as many table storage accounts as I needed to. A high-level storage account partition that makes sense for the kind of projects we want to do would be at the client level. If we could assign a unique table storage account to every client, then we would probably achieve our IO scalability goals and effectively use your billing system as a piece of our own. – user483679 Jun 12 '11 at 16:36
  • Just so we're clear... it's not my billing system :) And I think that you can have more than 5 storage accounts - but you'd have to ask Microsoft about this. – Stuart Jun 12 '11 at 16:42
  • You can definitely ask MS for more storage accounts, but they seem to draw the line at about 20. – knightpfhor Jun 12 '11 at 20:28