Grouping by a field in DocumentDB

Question

Is it possible, in some way, to group upon a field in DocumentDB, stored procedure or not?

Let's say I have the following collection:

[
    {
        name: "Item A",
        priority: 1
    },
    {
        name: "Item B",
        priority: 2
    },
    {
        name: "Item C",
        priority: 2
    },
    {
        name: "Item D",
        priority: 1
    }
]

I would like to get all the items in the highest priority group (priority 2 in this case). I do not know what value of the highest priority. I.e.:

[
    {
        name: "Item B",
        priority: 2
    },
    {
        name: "Item C",
        priority: 2
    }
]

With some crude LINQ, it would look something like this:

var highestPriority = 
    collection
        .GroupBy(x => x.Priority)
        .OrderByDescending(x => x.Key)
        .First();

As of November 2019, Azure Cosmos DB now supports GROUP BY queries: https://learn.microsoft.com/en-gb/azure/cosmos-db/sql-query-group-by — Andrew Liu, Dec 02 '19 at 22:18

score 4 · Accepted Answer · answered Oct 27 '15 at 16:46

DocumentDB currently does not support GROUP BY nor any other aggregation. It is the second most requested feature and is listed as "Under Review" on the DocumentDB UserVoice.

In the mean time, documentdb-lumenize is an aggregation library for DocumentDB written as a stored procedure. You load cube.string as a stored procedure, then you call it with an aggregation configuration. It's a bit overkill for this example, but it's perfectly capable of doing what you are asking here. If you pass this into the stored procedure:

{cubeConfig: {groupBy: "name", field: "priority", f: "max"}}

that should do what you want.

Note, Lumenize can do a lot more than that including simple group-by's with other function (sum, count, min, max, median, p75, etc.), pivot tables, and all the way up to complicated n-dimensional hypercubes with multiple metrics per cell.

I have never tried loading cube.string from .NET because we're on node.js, but it is shipped as a string rather than javascript so you can easily load and send it.

Alternatively, you could write a stored procedure to do this simple aggregation.

Thanks for the answer. Would you mind to give me an example of such a stored proc? — Dave New, Oct 28 '15 at 06:20
Start with this sproc that counts the number of documents: https://github.com/lmaccherone/documentdb-mock/blob/master/stored-procedures/countDocuments.coffee Then modify the accumulator code `memo.count += count` to something like `memo.max = Math.max(memo.max, value)` in a loop for all of the returned rows. — Larry Maccherone, Oct 28 '15 at 16:56
the above link is expired, can I know the more about the Stored Procedure ? — Chandresh Khambhayata, Feb 14 '17 at 11:15
As of November 2019, Azure Cosmos DB now supports GROUP BY queries: https://learn.microsoft.com/en-gb/azure/cosmos-db/sql-query-group-by — Andrew Liu, Dec 02 '19 at 22:18

score 0 · Answer 2 · answered Jul 20 '18 at 08:11

GroupBy is still not supported in DocumentDB and the best method is described above (use a stored procedure) or as in the UserVoice item described use the Spark connector. However, if the set you want to group is relatively small, there is also another solution:

Get all the results without grouping from the collection and perform the grouping in-memory.

Thus instead of:

var highestPriority = 
collection
    .GroupBy(x => x.Priority)
    .OrderByDescending(x => x.Key)
    .First();

You use:

var highestPriority = 
collection
    .Where(<filter to reduce set>)
    .AsEnumerable()
    .GroupBy(x => x.Priority)
    .OrderByDescending(x => x.Key)
    .First();

The .AsEnumerable() gets your result from the documentDB and the groupBy is done afterwards in memory. But please note that this is not the optimal solution, and should only be used in cases where you know for sure that the result-set is small.

Grouping by a field in DocumentDB

2 Answers2

Linked