Best way to do a large amount of Azure Table Queries?

Question

I have an Azure table that has over a million entries and I am trying to do about 300,000 queries programmatically in C# in order to transfer some data to another system. Currently I am doing the following as I read through a file which has the partition and row keys:

while (!reader.EndOfStream)
{
    // parse the reader to get partition and row keys
    string currentQuery = TableQuery.CombineFilters(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partKey), TableOperators.And, TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, rowKey));
    TableQuery<MyEntity> query = new TableQuery<MyEntity>().Where(currentQuery);

    foreach (MyEntity entity in table.ExecuteQuery(query))
    {
        Console.WriteLine(entity.PartitionKey + ", " + entity.RowKey + ", " + entity.Timestamp.DateTime);
    }

    Thread.Sleep(25);
}

This is taking a very long time to complete(5+ hours). The queries are taking on average around 200 milliseconds from what I can see. I am kinda new to Azure so I figure I am doing something wrong. How can I improve it?

Suggestion: Export the database from Azure and put it on a local SQL Server. Then run your code and see if it's any faster. — Matt Runion, Sep 17 '15 at 20:11
@mrunion - That suggestion really isn't related to answering the question. — David Makogon, Sep 17 '15 at 20:27

score 2 · Accepted Answer · answered Sep 17 '15 at 20:34

A few things:

Not sure why you have a sleep call in your loop. Unless you're being throttled (storage supports 20,000 transactions per second), you shouldn't need that.
With a given partition key and row key, you'll get exactly one returned entity (as the combination pk+rk is unique). No need to loop through your results. You'll either get zero or one.
You're taking a single-threaded approach, so it's highly unlikely you'll be able to push storage transaction rates very hard. Consider parallelizing your retrievals.
I'm assuming you're not calling Console.Writeline() in your actual app. If so, this will slow you down as well.
Consider disabling Nagle's algorithm, via ServicePointManager.UseNagleAlgorithm = false;. Otherwise, individual low-level calls to storage might be buffered up to 500ms, to more densely pack the tcp packets. This will be important if you're spending cycles processing the content you read.

Thanks for your answer. As far as parallelizing my code, what would be the best method for doing this? I had tried using threadpools, but my overall runtime wasn't really changing. — JOHN SMITHTY, Sep 17 '15 at 21:09

Best way to do a large amount of Azure Table Queries?

1 Answers1