0

I'm using Azure.Data.Tables (12.6.1) and I need to query a single record from multiple partitions of a single table (so the result would be multiple records, 1 from each partition). Each entity needs to be looked up by its partition key and row key - for a single TableClient.GetEntity() call this would be a point query.

After reading the documentation I'm confused if it's efficient or not to call TableClient.QueryAsync() with multiple partition key / row key pairs and the search results I found provide contradicting suggestions.

Is it efficient to do this (for a number of partition key / row key combinations, up to ~50) or is it just better to call GetEntity() one by one, for each entity?

var filter = "(PartitionKey eq 'p1' And RowKey eq 'r1') Or " +
    "(PartitionKey eq 'p2' And RowKey eq 'r2') Or ...";
var results = await tableClient.QueryAsync(filter, 500, null, cancelToken);
xxbbcc
  • 16,930
  • 5
  • 50
  • 83

2 Answers2

1

I don't know if there is a definitive answer here as it probably depends on your specific requirements. I would suggest testing different options and tune accordingly.

Just for reference, here is a general reference about query performance for tables https://learn.microsoft.com/azure/storage/tables/table-storage-design-for-query

Christopher Scott
  • 2,676
  • 2
  • 26
  • 26
  • Thank you - I'm aware of that page. Unfortunately it doesn't describe if multiple query parameters (partition key+row key pairs) will be treated as independent point queries or not. It's also not very feasible to test a network service - I can test it on a small scale but I can't really replicate real world use cases. – xxbbcc Oct 29 '22 at 02:46
  • I don't believe they are treated as independent point queries, but I would expect the indexing to be just as good. But assuming the cost per query were identical in aggregate, it could still be a tradeoff between how many parallel point queries your application could tolerate vs the efficiency of waiting for a single larger request to be processed and returned. – Christopher Scott Nov 01 '22 at 19:26
  • The application can tolerate a _lot_ of independent point queries - they all happen on the Microsoft side, after all :) But you sort of confirmed what I saw hinted at in other blogs/posts - even though it should be trivial to treat these as highly optimizable sub-queries, they probably aren't so point queries are better. – xxbbcc Nov 01 '22 at 19:44
1

I settled on parallelizing point queries for this scenario, and has given good results. I have heavy-burst read scenarios, I may have 10's/100's of 1000's of lookups to do against 100's of millions of records). I prefer that over a query with a series of OR's, as those were tending to give worse throughput (I don't have any stats to hand now....)

For me parallelization happens through 2 means:

  1. lower level: awaiting a batch of Tasks, each making an individual point query
  2. higher level: architecting a particularly heavy workload to scale out over multiple instances, each making parallel queries via 1)
AdaTheDev
  • 142,592
  • 28
  • 206
  • 200