I got a 16GB collection with 2 partitions. When I query a document by it's Id, it is very slow. But querying by an indexed field is fast. Both are cross-partition query and if I pass partition key with the query it is fast but partition key is not always available for my query. Got similar results using .NET SDK and Document Explorer Query in Azure Portal.
The collection has custom indexing policy but as far as I know you don't need to index Id
or it may not be even possible.
Here are my queries and their corresponding request charges.
SELECT * FROM c where c.id = 'unique-id-123'
-- Request Charge: 344940.79 RUs, Document Count: 1
SELECT * FROM c WHERE c.otherId = 'NOT-so-uniqueId-123'
-- Request Charge: 5.08 RUs, Document Count: 3
As you know, Id is unique so the query returns 1 document while the second query is filtered by otherId
which is not so unique and returns 3 documents. Also notice the insanely high RUs consumption with the first query.
So why the 2nd query is faster than by Id?
Update:
Here are the gathered metrics for the above queries.
Query By Id:
Read 1 records in 1497 ms, 339173.109 RU, Size: 6873022 KB
QueryPreparationTime(ms): CompileTime = 2, LogicalBuildTime = 0,
PhysicalPlanBuildTime = 0, OptimizationTime = 0
QueryEngineTime(ms): DocumentLoadTime = 1126, IndexLookupTime = 0,
RuntimeExecutionTimes = 356, WriteOutputTime = 0
Query by indexed field:
Read 4 records in 2 ms, 7.56 RU, Size: 9 KB
QueryPreparationTime(ms): CompileTime = 0, LogicalBuildTime = 0,
PhysicalPlanBuildTime = 0, OptimizationTime = 0
QueryEngineTime(ms): DocumentLoadTime = 0, IndexLookupTime = 1,
RuntimeExecutionTimes = 0, WriteOutputTime = 0
These proves that query by Id is doing table scan as most of the time spent was from DocumentLoadTime
and no value for IndexLookupTime
.
But I thought Id should be the primary key and indexed by default as per this answer by @andrew-liu.