4

I'm struggling to find a code example from MS for the v3 SDK for queries with paging, they provide examples for V2 but that SDK is a completely different code base using the "CreateDocumentQuery" method.

I've tried searching through GitHub here: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/Microsoft.Azure.Cosmos.Samples/Usage/Queries/Program.cs

I believe I'm looking for a method example using continuation tokens, with the assumption that if I cache the previously used continuation tokens in my web app then I can page backwards as well as forwards?

I'm also not quite understanding MS explanation in that MaxItemCount doesn't actually mean it will only try to return X items, but simply limits the No. of items in each search across each partition, confused!

Can anyone point me to the right place for a code example please? I also tried searching through https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-pagination but appears to lead us to the older SDK (V2 I believe)

UPDATE (following comments from Gaurav below)

        public async Task<(List<T>, string)> QueryWithPagingAsync(string query, int pageSize, string continuationToken)
        {
            try
            {
                Container container = GetContainer();
                List<T> entities = new(); // Create a local list of type <T> objects.
                QueryDefinition queryDefinition = new QueryDefinition(query);

                using FeedIterator<T> resultSetIterator = container.GetItemQueryIterator<T>(
                query, // SQL Query passed to this method.
                continuationToken, // Value is always null for the first run.
                requestOptions: new QueryRequestOptions()
                {
                    // Optional if we already know the partition key value.
                    // Not relevant here becuase we're passing <T> which could
                    // be any model class passed to the generic method.
                    //PartitionKey = new PartitionKey("MyParitionKeyValue"), 

                    // This does not actually limit how many documents are returned if
                    // what we're querying resides across multiple partitions.
                    // If we set the value to 1, then control the number of times
                    // the loop below performs the ReadNextAsync, then we can control
                    // the number of items we return from this method. I'm not sure
                    // whether this is best way to go, it seems we'd be calling
                    // the API X no. times by the number of items to return? 
                    MaxItemCount = 1 
                });

                // Set var i to zero, we'll use this to control the number of iterations in 
                // the loop, then once i is equal to the pageSize then we exit the loop.
                // This allows us to limit the number of documents to return (hope this is the best way to do it)
                var i = 0; 

                while (resultSetIterator.HasMoreResults & i < pageSize)
                {
                    FeedResponse<T> response = await resultSetIterator.ReadNextAsync();
                    entities.AddRange(response);
                    continuationToken = response.ContinuationToken;
                    i++; // Add 1 to var i in each iteration.
                }
                return (entities, continuationToken);
            }
            catch (CosmosException ex)
            {
                //Log.Error($"Entities was not retrieved successfully - error details: {ex.Message}");

                if (ex.StatusCode == HttpStatusCode.NotFound)
                {
                    return (null, null);
                }
                else { throw; }
            }
        }

The above method is my latest attempt, and whilst I'm able to use and return continuation tokens, the next challenge is how to control the number of items returned from Cosmos. In my environment, you may notice the above method is used in a repo with where we're passing in model classes from different calling methods, therefore hard coding the partition key is not practical and I'm struggling with configuring the number of items returned. The above method is in fact controlling the number of items I am returning to the calling method further up the chain, but I'm worried that my methodology is resulting in multiple calls to Cosmos i.e. if I set the page size to 1000 items, am I making an HTTP call to Cosmos 1000 times?

I was looking at a thread here https://stackoverflow.com/questions/54140814/maxitemcount-feed-options-property-in-cosmos-db-doesnt-work but not sure the answer in that thread is a solution, and given I'm using the V3 SDK, there does not seem to be the "PageSize" parameter available to use in the request options.

However I also found an official Cosmos code sample here: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/Microsoft.Azure.Cosmos.Samples/Usage/Queries/Program.cs#L154-L186 (see example method "QueryItemsInPartitionAsStreams" line 171) and it looks like they have used a similar pattern i.e. setting the MaxItemCount variable to 1 and then controlling the no. of items returned in the loop before exiting. I guess I'd just like to understand better what, if any impact this might have on the RUs and API calls to Cosmos?

OJB1
  • 2,245
  • 5
  • 31
  • 63

3 Answers3

2

The solution:

Summary:

From the concerns raised in my question and taking note from Gaurav Mantri's comments, if we are fetching the items from Cosmos in a loop then the MaxItemCount does not actually limit the total number of results returned but simply limits the number of results per request. If we continue to fetch more items in the loop then we end up with more results returned than what the user may want to retrieve.

In my case, the reason for paging is to present the items back to the web App using a razor list view, but we want to be able to set the maximum number of results returned per page.

The solution below is based on capturing information on the count of items in each iteration of the loop, therefore if we check the Count of the items returned on each iteration of the loop and if we have achieved less than or equal to the MaxItemCount value then we break from the loop with our set maximum number of items and the continuationToken that we can use on the next method run.

I have tested the method with continuation tokens and am able to affectively page backwards and forwards, but the key difference from the code example in my original question is that we're only calling Cosmos DB once to get the desired number of results back, as opposed to limiting the request to one item per run and having to run multiple requests.

public async Task<(List<T>, string)> QueryWithPagingAsync(string query, int pageSize, string continuationToken)
{
    string unescapedContinuationToken = null;
    if (!String.IsNullOrEmpty(continuationToken)) // Check if null before unescaping.
    {
        unescapedContinuationToken = Regex.Unescape(continuationToken); // Needed in my case...
    }

    try
    {
        Container container = GetContainer();
        List<T> entities = new(); // Create a local list of type <T> objects.
        QueryDefinition queryDefinition = new(query); // Create the query definition.

        using FeedIterator<T> resultSetIterator = container.GetItemQueryIterator<T>(
        query, // SQL Query passed to this method.
        unescapedContinuationToken, // Value is always null for the first run.
        requestOptions: new QueryRequestOptions()
        {
            // MaxItemCount does not actually limit how many documents are returned
            // from Cosmos, if what we're querying resides across multiple partitions.
            // However this parameter will control the max number of items
            // returned on 'each request' to Cosmos.
            // In the loop below, we check the Count of the items returned
            // on each iteration of the loop and if we have achieved less than or 
            // equal to the MaxItemCount value then we break from the loop with
            // our set maximum number of items and the continuationToken
            // that we can use on the next method run.
            // 'pageSize' is the max no. items we want to return for each page in our list view.
            MaxItemCount = pageSize, 
        });

        while (resultSetIterator.HasMoreResults)
        {
            FeedResponse<T> response = await resultSetIterator.ReadNextAsync();
            entities.AddRange(response);
            continuationToken = response.ContinuationToken;

            // After the first iteration, we get the count of items returned.
            // Now we'll either return the exact number of items that was set
            // by the MaxItemCount, OR we may find there were less results than
            // the MaxItemCount, but either way after the first run, we should
            // have the number of items returned that we want, or at least
            // the maximum number of items we want to return, so we break from the loop.
            if (response.Count <= pageSize) { break; }
        }
        return (entities, continuationToken);
    }
    catch (CosmosException ex)
    {
        //Log.Error($"Entities was not retrieved successfully - error details: {ex.Message}");

        if (ex.StatusCode == HttpStatusCode.NotFound)
        {
            return (null, null);
        }
        else { throw; }
    }
}
OJB1
  • 2,245
  • 5
  • 31
  • 63
1

Please try the following code. It fetches all documents from a container with a maximum of 100 documents in a single request.

using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using Microsoft.Azure.Cosmos;

namespace CosmosDbSQLAPISamples
{
    class Program
    {
        private static string connectionString =
            "AccountEndpoint=https://account-name.documents.azure.com:443/;AccountKey=account-key==;";
        private static string databaseName = "database-name";
        private static string containerName = "container-name";
        static async Task Main(string[] args)
        {
            CosmosClient client = new CosmosClient(connectionString);
            Container container = client.GetContainer(databaseName, containerName);
            string query = "Select * From Root r";
            string continuationToken = null;
            int pageSize = 100;
            do
            {
                var (entities, item2) = await GetDataPage(container, query, continuationToken, pageSize);
                continuationToken = item2;
                Console.WriteLine($"Total entities fetched: {entities.Count}; More entities available: {!string.IsNullOrWhiteSpace(continuationToken)}");
            } while (continuationToken != null);
        }

        private static async Task<(List<dynamic>, string)> GetDataPage(Container container, string query, string continuationToken, int pageSize)
        {
            List<dynamic> entities = new(); // Create a local list of type <T> objects.
            QueryDefinition queryDefinition = new QueryDefinition(query);
            QueryRequestOptions requestOptions = new QueryRequestOptions()
            {
                MaxItemCount = pageSize
            };
            FeedIterator<dynamic> resultSetIterator = container.GetItemQueryIterator<dynamic>(query, continuationToken, requestOptions);
            FeedResponse<dynamic> response = await resultSetIterator.ReadNextAsync();
            entities.AddRange(response);
            continuationToken = response.ContinuationToken;
            return (entities, continuationToken);
        }
    }
}

UPDATE

I think I understand your concerns now. Essentially there are two things you would need to consider:

  1. MaxItemCount - This is the maximum number of documents that will be returned by Cosmos DB in a single request. Please note that you can get anywhere from 0 to the value specified for this parameter. For example, if you specify 100 as MaxItemCount you can get anywhere from 0 to 100 documents in a single request.
  2. FeedIterator - It keeps track of continuation token internally. Based on the response received, it sets HasMoreResults to true or false if a continuation token is found. Default value for HasMoreResults is true.

Now coming to your code, when you do something like:

while (resultSetIterator.HasMoreResults)
{
    //some code here...
}

Because FeedIterator keeps track of the continuation token, this loop will return all the documents that match the query. If you notice, in my code I am not using this logic. I simply send the request once and then return the result.

I think setting MaxItemCount to 1 is a bad idea. If you want to fetch say 100 then you're making a minimum of 100 requests to your Cosmos DB account. If you have a hard need to get exactly 100 (or any fixed number) documents from your API, you can implement your own pagination logic. For example, please see code below. It fetches a total of 1000 documents with a maximum of 100 documents in a single request.

static async Task Main(string[] args)
{
    CosmosClient client = new CosmosClient(connectionString);
    Container container = client.GetContainer(databaseName, containerName);
    string query = "Select * From Root r";
    string continuationToken = null;
    int pageSize = 100;
    int maxDocumentsToFetch = 1000;
    List<dynamic> documents = new List<dynamic>();
    do
    {
        var numberOfDocumentsToFetch = Math.Min(pageSize, maxDocumentsToFetch);
        var (entities, item2) = await GetDataPage(container, query, continuationToken, numberOfDocumentsToFetch);
        continuationToken = item2;
        Console.WriteLine($"Total entities fetched: {entities.Count}; More entities available: {!string.IsNullOrWhiteSpace(continuationToken)}");
        maxDocumentsToFetch -= entities.Count;
        documents.AddRange(entities);
    } while (maxDocumentsToFetch > 0 && continuationToken != null);
}
Gaurav Mantri
  • 128,066
  • 12
  • 206
  • 241
  • Hi Gaurav, thanks for your reply. In your bottom method, I see you're returning the continuationToken string separately along with the list of entities. Is the continuationToken already included in the entities list because you derived the list of entities from the raw FeedResponse response, as opposed to returning "FeedResponse response.Resource" ? I will do some testing this eve, because I'm using my method in a repo then I'd have to adapt it to my environment, but I see in general what your code is doing, thanks – OJB1 Aug 04 '21 at 07:56
  • `Is the continuationToken already included in the entities list because you derived the list of entities from the raw FeedResponse response, as opposed to returning "FeedResponse response.Resource"` - That's correct. You can return `FeedResponse` and handle documents and continuation token in the calling method. – Gaurav Mantri Aug 04 '21 at 09:40
  • Hi, I'm almost there but I'm finding that the MaxItemCount = pageSize value is not being respected, the API returns all items in the container, this also means that the continuationToken is set to null on the last iteration. Just trying to figure out why this would be. I've checked in Debug to ensure that this variable does have the value set, which in my initial testing I'm trying to return just 1 item i.e. pageSize of 1. – OJB1 Aug 04 '21 at 20:34
  • Just found this thread: https://stackoverflow.com/questions/54140814/maxitemcount-feed-options-property-in-cosmos-db-doesnt-work So the MaxItemCount is not what I thought it was, baah! bit more work to do... – OJB1 Aug 04 '21 at 20:43
  • please see my edits in the question, welcome your thoughts, thanks – OJB1 Aug 04 '21 at 21:30
  • Updated my answer. HTH. – Gaurav Mantri Aug 05 '21 at 02:44
  • Hi, I'm struggling to understand you methodology. In your example, if the first run returns more entities that what the user wants i.e. you returned the 1000 items but the user only wanted a set maximum of 100, then how does with work with the continuation token that is now set on the basis of returning 1000 items, so the next run would continue at 1001 onwards as opposed to 101 onwards? thanks – OJB1 Aug 05 '21 at 18:15
  • In my code, 1000 is the total number of documents requested by user. The code asks for just 100 documents from Cosmos DB in a single request. Assuming Cosmos DB returns exactly 100 items in each request, my code will call GetDataPage method exactly 10 times. If the user only wants 100, you set maxDocumentsToFetch to 100 and the code will loop only once. I hope this clarifies things for you. – Gaurav Mantri Aug 05 '21 at 18:47
  • Hi, I've just figured that out this second by doing some more testing. In my loop, it looks to be that I can simply exit the loop once I've seen that FeedResponse response.Count <= pageSize. Will keep testing, many thanks – OJB1 Aug 05 '21 at 18:53
  • Having an issue with the conituationToken, tried doing JsonConvert.ToString(continuationToken); to add the escapes and then passed this to the method, but I get an error message back "Encountered an unexpected JSON token" My token looks like this: {\"token\":\"+RID:~xm0qALn5EaIGAAAAAAAAAA==#RT:1#TRC:2#ISV:2#IEO:65567#QCF:7#FPC:AQYAAAAAAAAAGAAAAAAAAAA=\",\"range\":{\"min\":\"\",\"max\":\"FF\"}} – OJB1 Aug 05 '21 at 19:50
0

In Code: var sqlQueryText = $"SELECT * FROM c WHERE OFFSET {offset} LIMIT {limit}";

but this is more expensive (more RU/s) then using continuationToken.

When using Offset/Limit continuationToken will be used in background by Azure Cosmos SDK to get all the results.

Leszek P
  • 1,807
  • 17
  • 24