2

We are storing a folder tree, the number of items is huge so we have created a partition on the parent folder.

When we issue queries such as

SELECT * FROM root WHERE root.parentPath = "\\server\share\shortpath" AND root.isFile 

The RUs is very low and the performance is very good.

But, when we have a long path eg

SELECT * FROM root WHERE root.parentPath = "\\server\share\a very\long\path\longer\than\this" AND root.isFile

The RUs go up to 5000 and the performance suffers.

parentPath works well as a partition key as all queries include this field in the filter.

If I add another clause to the query it also becomes very fast, eg if I do something like and root.name = 'filename'

It's almost like it's scanning the entire partition based on the hash that's derived from it.

The Query returns NO DATA

which is fine as its someone looking for child folders under a given node, once you get deep it just gets very slow.

Query Metrics

x-ms-documentdb-query-metrics: 

totalExecutionTimeInMs=1807.61;
queryCompileTimeInMs=0.08;
queryLogicalPlanBuildTimeInMs=0.04;
queryPhysicalPlanBuildTimeInMs=0.06;
queryOptimizationTimeInMs=0.01;
VMExecutionTimeInMs=1807.11;
indexLookupTimeInMs=0.65;
documentLoadTimeInMs=1247.08;
systemFunctionExecuteTimeInMs=0.00;
userFunctionExecuteTimeInMs=0.00;
retrievedDocumentCount=72554;
retrievedDocumentSize=59561577;
outputDocumentCount=0;
outputDocumentSize=49;
writeOutputTimeInMs=0.00;
indexUtilizationRatio=0.00

From string

x-ms-documentdb-query-metrics: totalExecutionTimeInMs=1807.61;queryCompileTimeInMs=0.08;queryLogicalPlanBuildTimeInMs=0.04;queryPhysicalPlanBuildTimeInMs=0.06;queryOptimizationTimeInMs=0.01;VMExecutionTimeInMs=1807.11;indexLookupTimeInMs=0.65;documentLoadTimeInMs=1247.08;systemFunctionExecuteTimeInMs=0.00;userFunctionExecuteTimeInMs=0.00;retrievedDocumentCount=72554;retrievedDocumentSize=59561577;outputDocumentCount=0;outputDocumentSize=49;writeOutputTimeInMs=0.00;indexUtilizationRatio=0.00
Steve Drake
  • 1,968
  • 2
  • 19
  • 41
  • I'm curious, are you specifying the partition key value on the ReqeustOptions level? It shouldn't matter but still wanna know. – Nick Chapsas Feb 01 '19 at 14:15
  • I tried with and without. Thanks – Steve Drake Feb 01 '19 at 14:42
  • 1
    What are the query metrics you see from this query? Could you share them with me? chrande (at symbol) microsoft (dot) com? Doc on query metrics: https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-query-metrics – Chris Anderson Feb 01 '19 at 18:43

1 Answers1

4

This is because of a path length limit in Indexing v1.

We have increased the path length limit to a larger value in the new index layout, therefore migrating the collections to this new layout would fix the issue and provide many performance benefit.

We have rolled out the new index layout for new collections by default. If it is possible for you to recreate the current collection and migrate existing data over there, it would be great. Otherwise, an alternative is to trigger the migration process to move existing collections to the new index layout. The following C# method can be used to do that:

static async Task UpgradeCollectionToIndexV2Async(

        DocumentClient client,

        string databaseId,

        string collectionId)

    {

        DocumentCollection collection = (await client.ReadDocumentCollectionAsync(string.Format("/dbs/{0}/colls/{1}", databaseId, collectionId))).Resource;

        collection.SetPropertyValue("IndexVersion", 2);

        ResourceResponse<DocumentCollection> replacedCollection = await client.ReplaceDocumentCollectionAsync(collection);

        Console.WriteLine(string.Format(CultureInfo.InvariantCulture, "Upgraded indexing version for database {0}, collection {1} to v2", databaseId, collectionId));

    }

It could take several hours for the migration to complete, depending on the amount of data in the collection. The issue should be addressed once it is completed.

(This was copy pasted from an email conversation we had to resolve this issue)

Chris Anderson
  • 8,305
  • 2
  • 29
  • 37