I have recently started using Azure Cosmos DB in our project. For the reporting purpose, we need to get all the Partition Keys in the collection. I could not find any suitable API to achieve it.
Asked
Active
Viewed 1.6k times
1 Answers
9
UPDATE: According to Brian in the comments below, DISTINCT
is now supported. Try something like:
SELECT DISTINCT c.partitionKey FROM c
Prior answer: Idea that could work but for one thing...
The only way to get the actual partition key values is to do a unique aggregate on that field.
You can directly hit the REST endpoint at https://{your endpoint domain}.documents.azure.com/dbs/{your collection's uri fragment}/pkranges
to pull back the minInclusive
and maxExclusive
ranges for each partition but those are hash space ranges and I don't know how to convert those into partition key values nor do a fanout using the actual minInclusive hash.
Also, there is a slim possibility that the pkranges can change between the time you retrieve them and the time you go to do something with them.

Larry Maccherone
- 9,393
- 3
- 27
- 43
-
we can get the partition key for cosmosdb collection – Atul Chaudhary Oct 27 '17 at 08:10
-
How do you do it? – Larry Maccherone Oct 27 '17 at 10:19
-
It's sort of amazing there isn't a direct way to read PKs given their huge importance. Unless I'm missing something, retrieving the ranges gives you range IDs, which are only useful if you want to run the same unique aggregate query on each range individually (by setting the range ID on a `FeedOption`). It doesn't really get you any closer to the underlying values. For what it's worth, though, the .NET SDK does let you query ranges via `ReadPartitionKeyRangeFeedAsync`. – McGuireV10 Nov 16 '17 at 14:12
-
That first sentence is pretty much all you need. There's (unofficial) support for DISTINCT based on my own testing, which means you can do something like 'SELECT DISTINCT c.partitionKey FROM c' and get yourself a list. – Brian May 24 '18 at 21:29
-
@Brian, that is new since this was written. I've updated my answer. – Larry Maccherone May 25 '18 at 23:35
-
1The problem with `SELECT DISTINCT c.partitionKey FROM c` is that it's an expensive query (321 RU in a pretty small database with 6 partitions) – Thomas Levesque Sep 17 '18 at 11:08
-
When I ran the query on my collection (size: 25GB; partition count: 1000), it costed about 5000 RU's while returning only 2 partition keys. Bad! – TJ Galama Oct 03 '19 at 13:58
-
Yes, the `SELECT DISTINCT...` approach is going to chew up some RUs. I haven't tried it in a while but I suspect the pkranges trick still works. – Larry Maccherone Oct 04 '19 at 14:13
-
further to @ThomasLevesque 's comment, this query quickly starts running very slowly as partition keys increase, if someone from the COSMOS team could chime in with thoughts on querying for all distinct partition keys that would be grand. – blomster Nov 17 '21 at 15:40
-
@blomster, have you tried my original answer? It doesn't suffer from the scaling issues that the `SELECT DISTINCT...` approach does – Larry Maccherone Nov 18 '21 at 16:21
-
1The `pkranges` stuff is entirely incorrect and you clearly never actually tried to use it to get the distinct list of logical partition keys. This returns a document for every **physical** partition - and the minInclusive and maxExclusive are just binary values that denote the range of the hash space that this partition is for. e.g. `"minInclusive": "05C1EF0F87C3E0", "maxExclusive": "FF",` - there is no way to determine the actual partitionKey values that exist in the data from that – Martin Smith Nov 19 '21 at 13:47