0

I have an Azure table that has over a million entries and I am trying to do about 300,000 queries programmatically in C# in order to transfer some data to another system. Currently I am doing the following as I read through a file which has the partition and row keys:

while (!reader.EndOfStream)
{
    // parse the reader to get partition and row keys
    string currentQuery = TableQuery.CombineFilters(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partKey), TableOperators.And, TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, rowKey));
    TableQuery<MyEntity> query = new TableQuery<MyEntity>().Where(currentQuery);

    foreach (MyEntity entity in table.ExecuteQuery(query))
    {
        Console.WriteLine(entity.PartitionKey + ", " + entity.RowKey + ", " + entity.Timestamp.DateTime);
    }

    Thread.Sleep(25);
}

This is taking a very long time to complete(5+ hours). The queries are taking on average around 200 milliseconds from what I can see. I am kinda new to Azure so I figure I am doing something wrong. How can I improve it?

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291

1 Answers1

2

A few things:

  1. Not sure why you have a sleep call in your loop. Unless you're being throttled (storage supports 20,000 transactions per second), you shouldn't need that.
  2. With a given partition key and row key, you'll get exactly one returned entity (as the combination pk+rk is unique). No need to loop through your results. You'll either get zero or one.
  3. You're taking a single-threaded approach, so it's highly unlikely you'll be able to push storage transaction rates very hard. Consider parallelizing your retrievals.
  4. I'm assuming you're not calling Console.Writeline() in your actual app. If so, this will slow you down as well.
  5. Consider disabling Nagle's algorithm, via ServicePointManager.UseNagleAlgorithm = false;. Otherwise, individual low-level calls to storage might be buffered up to 500ms, to more densely pack the tcp packets. This will be important if you're spending cycles processing the content you read.
David Makogon
  • 69,407
  • 21
  • 141
  • 189
  • Thanks for your answer. As far as parallelizing my code, what would be the best method for doing this? I had tried using threadpools, but my overall runtime wasn't really changing. – JOHN SMITHTY Sep 17 '15 at 21:09