2

Is there a way to force Windows Azure table store partitions to distinct physical hardware? Windows Azure MSDN blog says that the environment automatically load balances the partitions between servers, but I couldn't perform a stress test where I could quantifiably see that two partitions are on different actual machines.

Considering the following filter in a query:

(PartitionKey == "a" && RowKey == "1") || (PartitionKey == "b" && RowKey == "2")

If the two partitions are on different physical machines, the query can be executed in a parallel manner addressing the two partition servers simultaneously, so it evaluates faster. However, I can't seem to find a way to actually measure this performance gain.

What is more important in partitioning? The amount of data in the table or the 500 query/sec limit on partitions mentioned here.

AvkashChauhan
  • 20,495
  • 3
  • 34
  • 65
Tamas
  • 6,260
  • 19
  • 30

2 Answers2

2

The query you mention is a bad one. Windows Azure storage doesn't optimize OR queries like that, so it will result in a full table scan. You'll definitely want to fire off two queries in parallel yourself and union the results (in this case, just the two entities that come back).

To actually answer your question, I know of no way to force table storage to rebalance partitions.

user94559
  • 59,196
  • 6
  • 103
  • 103
  • Where can I find out more about the Windows Azure table storage query optimization? I have only found [this](http://www.slideshare.net/sundararajan009/windows-azure-table-storage-deep-dive), which mentions what you have said. – Tamas Jun 02 '12 at 20:48
0

You can gain superior performance (under the limit 500 query/sec/partition and 5000/transactions/seconds/storage account) using Parallel Threaded Reading and add more threads per your stress test.

The link below has an experiment where "I was able to read 365,000 rows by using 365 threads, and I got the data in an average of about 7 seconds. For 30,000 rows spread over 30 partitions using 30 threads, I was averaging 1.4 seconds. Huge win! ", worth checking!!

Azure Table Storage Performance from Massively Parallel Threaded Reading

Community
  • 1
  • 1
AvkashChauhan
  • 20,495
  • 3
  • 34
  • 65
  • Thanks, I haven't found that. However, I was kind of doing the same, I was using TPL to perform parallel queries against my table with several partitions. Obviously, altogether the execution time is faster, but this does not mean that the partitions are on separate physical servers. I believe that individual queries should become faster if the data has been separated to multiple machines, as in that case each machine would require less time to perform the query. – Tamas May 25 '12 at 15:12
  • As I can see this implies that query rate needs to be increased to force the load balancer to move partitions to different machines. But this does not seem to happen when I reach the 500 query/sec/partition limit. – Tamas May 25 '12 at 15:14
  • Are you measuring the performance "per query on each partition" or "on bulk read including all the threads"? – AvkashChauhan May 25 '12 at 15:34
  • Have u stumbled upon this: http://stackoverflow.com/questions/4535740/generic-class-for-performing-mass-parallel-queries-feedback – AvkashChauhan May 25 '12 at 15:35
  • I'm measuring both. And obviously I see huge improvement on overall speed with TPL, but I believe single queries should become faster as well if the data gets stored on separate physical machines. How else could I detect that the partitions got load balanced? – Tamas May 25 '12 at 17:39
  • See my answer... you won't see the single query getting faster. What's likely to happen once the partition is split is that you'll only get a portion of the answer back and a continuation token (so you can issue a second query to get the rest of the results). That's how queries that span machine boundaries work. (The system *won't* fan out the query for you.) – user94559 May 26 '12 at 00:27