5

Goal:

To return the count of distinct values of a particular document field within a partition of a Cosmos collection.

Steps:

If I run the following query on my Azure Cosmos database,

SELECT DISTINCT c.field
FROM c
WHERE c.field = 'abc' AND c.partitionKeyField = '123'

I get one row as expected, e.g. the following response

[
    {
        "field": "abc"
    }
]

However, if I then run the following query in an attempt to count the number of distinct documents in the response, via the following query

SELECT VALUE COUNT(1)
FROM (
    SELECT DISTINCT c.field
    FROM c
    WHERE c.field = 'abc' AND c.partitionKeyField = '123'
)

It returns

[
    6
]

This is the total number of documents with c.field set to "abc" rather than the number of distinct values of c.field.

Question:

Please could you help me understand why the query returns the number of documents rather the number of distinct values for c.field, and if there is a query which will return the number of distinct values of c.field, i.e. 1?

Edit - PS. I know this a contrived example as by definition the number of unique values of c.field is always 1 - I have deliberately simplified this from the real case.

user1857450
  • 551
  • 8
  • 20
  • Interestingly `GROUP BY` returns what I'd like, e.g. ```sql SELECT d.field, COUNT(1) FROM ( SELECT DISTINCT c.field FROM c WHERE c.field = 'abc' AND c.partitionKeyField = '123' ) AS d GROUP BY d.field ``` Returns ```json [ { "field": "abc", "$1": 1 } ] ``` – user1857450 May 16 '20 at 22:42
  • 1
    This type of query is not reliable in my experience. I reported something similar [here](https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/38610298-bug-incorrect-results-returned-from-count-distinc) and got a ludicrous response that totally missed the point but raised it again through other channels and think it is on the radar to be fixed at some point... – Martin Smith May 16 '20 at 22:46
  • Thanks. Good to know I should avoid using this query. Your link was one of the few pages I found with someone with a similar issue. Did you solve it with a `GROUP BY` like in the comment above, or something else? – user1857450 May 16 '20 at 22:52
  • I think I just concluded that I was out of luck trying to run this on CosmosDb and didn't come up with any workaround. Hopefully the `group by` works for you! – Martin Smith May 16 '20 at 23:01

1 Answers1

3

At the moment of writing this (22/01/2021) this query provides correct number of distinct values:

SELECT COUNT(UniqueFields) AS UniqueCount
FROM (SELECT DISTINCT c.field
    FROM c
    WHERE c.field = 'abc' AND c.partitionKeyField = '123') as UniqueFields
Krzysztof Madej
  • 32,704
  • 10
  • 78
  • 107