2

I'm trying to write a query that uses a JOIN to perform a geo-spatial match against locations in a array. I got it working, but added DISTINCT in order to de-duplicate (Query A):

SELECT DISTINCT VALUE
    u
FROM
    u
JOIN loc IN u.locations
WHERE
    ST_WITHIN(
        {'type':'Point','coordinates':[loc.longitude,loc.latitude]},
        {'type':'Polygon','coordinates':[[[-108,-43],[-108,-40],[-110,-40],[-110,-43],[-108,-43]]]})

However, I then found that combining DISTINCT with continuation tokens isn't supported unless you also add ORDER BY:

System.ArgumentException: Distict query requires a matching order by in order to return a continuation token. If you would like to serve this query through continuation tokens, then please rewrite the query in the form 'SELECT DISTINCT VALUE c.blah FROM c ORDER BY c.blah' and please make sure that there is a range index on 'c.blah'. 

So I tried adding ORDER BY like this (Query B):

SELECT DISTINCT VALUE
    u
FROM
    u
JOIN loc IN u.locations
WHERE
    ST_WITHIN(
        {'type':'Point','coordinates':[loc.longitude,loc.latitude]},
        {'type':'Polygon','coordinates':[[[-108,-43],[-108,-40],[-110,-40],[-110,-43],[-108,-43]]]})
ORDER BY
    u.created

The problem is, the DISTINCT no longer appears to be taking effect because it returns, for example, the same record twice.

To reproduce this, create a single document with this data:

{
    "id": "b6dd3e9b-e6c5-4e5a-a257-371e386f1c2e",
    "locations": [
        {
            "latitude": -42,
            "longitude": -109
        },
        {
            "latitude": -42,
            "longitude": -109
        }
    ],
    "created": "2019-03-06T03:43:52.328Z"
}

Then run Query A above. You will get a single result, despite the fact that both locations match the predicate. If you remove the DISTINCT, you'll get the same document twice.

Now run Query B and you'll see it returns the same document twice, despite the DISTINCT clause.

What am I doing wrong here?

me--
  • 1,978
  • 1
  • 22
  • 42

1 Answers1

0

Reproduced your issue indeed,based on my researching,it seems a defect in cosmos db distinct query. Please refer to this link:Provide support for DISTINCT.

This feature is broke in the data explorer. Because cosmos can only return 100 results per page at a time, the distinct keyword will only apply to a single page. So, if your result set contains more than 100 results, you may still get duplicates back - they will simply be on separately paged result sets.

You could describe your own situation and vote up this feedback case.

Jay Gong
  • 23,163
  • 2
  • 27
  • 32