0

The HOUR_COUNTS collections contains {docId, hour, count}

It's very easy for me to get the sum of the count of docId by using the following mongodb query:

db.HOUR_COUNTS.aggregate(
    [
        {
            $match: { hour: { $gte: 10 } }
        },
        {
            $group: { _id: "$docId", total: { $sum: "$count" } }
        },
        { 
            $sort: { total: -1, _id: -1 }
        },
        {
            $limit: 20
        }
    ]
)

Then I can get the following result:

{ "_id" : 6831, "total" : 6 }
{ "_id" : 6830, "total" : 6 }
{ "_id" : 6849, "total" : 4 }
{ "_id" : 6848, "total" : 4 }
{ "_id" : 6847, "total" : 3 }

It's the time for me to do it using Spring Data

I tried to do this but it is not going to work:

    Aggregation agg = newAggregation(
            match(where("hour").gte(0)),
            project("docId"),
            group("docId").sum("count").as("total"), 
            project("total").and("docId").previousOperation(), 
            sort(Sort.Direction.DESC, "total", "docId"),
            limit(20)
        );

The error is:

java.lang.IllegalArgumentException: Invalid reference 'count'!

Therefore, I would like to know how to make the query work on Spring Data. Thank you.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
fmchan
  • 760
  • 1
  • 11
  • 29

1 Answers1

2

Why would this be expected to work? Which is the question you should be asking yourself really.

In the aggregation pipeline operations, operators such as $project and $group only ever "return" the fields that you explicitly ask them to. As a "pipeline" concept, only the "output" of the previous piped stage is available to the next stage and those thereafter, until otherwise possibly modified again.

So what you wrote in your Java code is not equal to what you experimented in the shell. You try to refer to a "field" you excluded with a prior $project operation. So don't do that. You seem to have a false view on how things actually optimize in the aggregation pipeline:

    Aggregation agg = newAggregation(
        match(Criteria.where("hour").gte(10)),
        group("docId").sum("count").as("total"),
        sort(Sort.Direction.DESC, "total","docId"),
        limit(20)
    );

So that is actually the "same" as what you wrote before. You don't need the additional "project" operations, and they are detrimental to your intended result.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • Thank you. I have tried to do so but unfortunately it shows all docId=0, the total is correct but just cannot display docId – fmchan Aug 20 '14 at 01:58
  • oh great I changed the output model which called HourCountResult. previous variables include {docId, total} and then I changed as {id, total} so it works. However are there any ways to keep using docId instead of id. Thank you. – fmchan Aug 20 '14 at 02:03
  • @fmchan The `$group` pipeline stage uses `_id` as a mandatory grouping key. You can always at `$project` at the "end" of your pipeline to omit `_id` and put the content under `docId`. – Neil Lunn Aug 20 '14 at 02:08