4

As the title suggests I'm wondering how to create an effective index for GROUP BY queries in CosmosDB.

Say the documents look something like:

{
    "pk": "12345",
    "speed": 500
},
{
    "pk": "6789",
    "speed": 100
}

Doing a query to find out the SUM of the speed grouped by the partition key would look something like:

SELECT c.pk, SUM(c.speed) FROM c WHERE c.pk IN ('12345','6789') GROUP BY c.pk

With about ~1.6 million documents this query costs 1489.51 RUs. However, splitting this up into two queries such as:

SELECT SUM(c.speed) FROM c WHERE c.pk = '12345'

SELECT SUM(c.speed) FROM c WHERE c.pk = '6789'

each of them cost only ~2.8 RUs each. Obviously the results would need some post-processing compared to the GROUP BY query to match. But a total of 5.6 RUs compared to 1489 RUs makes it worth it.

The indexing on the collection is as follows:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*"
        }
    ],
    "excludedPaths": [
        {
            "path": "/\"_etag\"/?"
        }
    ],
    "compositeIndexes": [
        [
            {
                "path": "/pk",
                "order": "ascending"
            },
            {
                "path": "/speed",
                "order": "ascending"
            }
        ]
    ]
}

Am I completely missing something or how can the GROUP BY be so much more expensive? Is there any indexing I can do to bring it down?

Thanks in advance!

Maddnias
  • 43
  • 3

2 Answers2

4

Currently GROUP BY does not not yet use the index.

This is currently being worked on. I would revisit sometime towards the end of the year to verify it is supported.

Mark Brown
  • 8,113
  • 2
  • 17
  • 21
  • Understood, thanks for the answer. So I guess the best approach until then would be splitting the query up instead of grouping like described in the topic if I wish to save RUs? – Maddnias Sep 28 '21 at 06:31
  • Yes, run them as separate operations for now. – Mark Brown Sep 28 '21 at 15:06
2

This feature is supported now , The query engine in Azure Cosmos DB Core (SQL) API now has a new system function and optimizations for a set of query operations to better use the index.

Sajeetharan
  • 216,225
  • 63
  • 350
  • 396