Getting multiple entities using Azure TableStorage over multiple partitions

Question

I'm using Azure.Data.Tables (12.6.1) and I need to query a single record from multiple partitions of a single table (so the result would be multiple records, 1 from each partition). Each entity needs to be looked up by its partition key and row key - for a single TableClient.GetEntity() call this would be a point query.

After reading the documentation I'm confused if it's efficient or not to call TableClient.QueryAsync() with multiple partition key / row key pairs and the search results I found provide contradicting suggestions.

Is it efficient to do this (for a number of partition key / row key combinations, up to ~50) or is it just better to call GetEntity() one by one, for each entity?

var filter = "(PartitionKey eq 'p1' And RowKey eq 'r1') Or " +
    "(PartitionKey eq 'p2' And RowKey eq 'r2') Or ...";
var results = await tableClient.QueryAsync(filter, 500, null, cancelToken);

Use a concat instead of an OR. – jdweng Oct 26 '22 at 17:14 — jdweng, Oct 26 '22 at 17:14
@jdweng what do you mean by that? – xxbbcc Oct 26 '22 at 17:16 — xxbbcc, Oct 26 '22 at 17:16

score 1 · Answer 1 · answered Oct 28 '22 at 19:36

1

I don't know if there is a definitive answer here as it probably depends on your specific requirements. I would suggest testing different options and tune accordingly.

Just for reference, here is a general reference about query performance for tables https://learn.microsoft.com/azure/storage/tables/table-storage-design-for-query

answered Oct 28 '22 at 19:36

Christopher Scott

2,676
2
26
26

Thank you - I'm aware of that page. Unfortunately it doesn't describe if multiple query parameters (partition key+row key pairs) will be treated as independent point queries or not. It's also not very feasible to test a network service - I can test it on a small scale but I can't really replicate real world use cases. – xxbbcc Oct 29 '22 at 02:46
I don't believe they are treated as independent point queries, but I would expect the indexing to be just as good. But assuming the cost per query were identical in aggregate, it could still be a tradeoff between how many parallel point queries your application could tolerate vs the efficiency of waiting for a single larger request to be processed and returned. – Christopher Scott Nov 01 '22 at 19:26
The application can tolerate a _lot_ of independent point queries - they all happen on the Microsoft side, after all :) But you sort of confirmed what I saw hinted at in other blogs/posts - even though it should be trivial to treat these as highly optimizable sub-queries, they probably aren't so point queries are better. – xxbbcc Nov 01 '22 at 19:44

score 1 · Answer 2 · answered Nov 09 '22 at 14:48

I settled on parallelizing point queries for this scenario, and has given good results. I have heavy-burst read scenarios, I may have 10's/100's of 1000's of lookups to do against 100's of millions of records). I prefer that over a query with a series of OR's, as those were tending to give worse throughput (I don't have any stats to hand now....)

For me parallelization happens through 2 means:

lower level: awaiting a batch of Tasks, each making an individual point query
higher level: architecting a particularly heavy workload to scale out over multiple instances, each making parallel queries via 1)

Thank you for the answer - I ended up doing the same. – xxbbcc Nov 09 '22 at 15:04 — xxbbcc, Nov 09 '22 at 15:04

Getting multiple entities using Azure TableStorage over multiple partitions

2 Answers2