151

Everybody. In mongo group query, the result shows only the key(s) in arguments. How to keep the first document in each group like mysql query group. for example:

-------------------------------------------------------------------------
|  name  | age  |  sex  | province |   city   |   area   |   address     |
-------------------------------------------------------------------------
| ddl1st | 22   | 纯爷们 |  BeiJing |  BeiJing | ChaoYang | QingNianLu    |
| ddl1st | 24   | 纯爷们 |  BeiJing |  BeiJing | XuHui    | ZhaoJiaBangLu |
|  24k   | 220  | ...   |  ....    |  ...     | ...      | ...           |
-------------------------------------------------------------------------



db.users.group({key: { name: 1},reduce: function ( curr, result ) { result.count ++ },initial: {count : 0 } })

result:

[
{
    "name" : "ddl1st",
    "count" : 1
},
{
    "name" : "24k",
    "count" : 1
}
]

How to get the following:

[
   {
   "name" : "ddl1st",
   "age" : 22,
   "sex" : "纯爷们",
   "province" : "BeiJing",
   "city" : "BeiJing",
   "area" : "ChaoYang",
   "address" : "QingNianLu",
   "count" : 1
   },
   {
   "name" : "24k",
   "age" : 220,
   "sex" : "...",
   "province" : "...",
   "city" : "...",
   "area" : "...",
   "address" : "...",
   "count" : 1
}
]
SWalters
  • 3,615
  • 5
  • 30
  • 37
plusor
  • 1,711
  • 2
  • 11
  • 11

12 Answers12

308

If you want to keep the information about the first matching entries for each group, you can try aggregating like:

    db.test.aggregate([{
      $group: {
         _id : '$name',
         name : { $first: '$name' },
         age : { $first: '$age' },
         sex : { $first: '$sex' },
         province : { $first: '$province' },
         city : { $first: '$city' },
         area : { $first: '$area' },
         address : { $first: '$address' },
         count : { $sum: 1 },
      }
    }]);
Anatoly
  • 20,799
  • 3
  • 28
  • 42
MervS
  • 5,724
  • 3
  • 23
  • 37
  • I am not familiar with mongoexport but it may have something to do with your usage of single and double quotes. – MervS May 22 '13 at 02:23
  • 9
    Why do you need `{$first: '$age'}` etc.? Is it possible to just have `age: $age`? – lightalchemist Dec 16 '17 at 08:04
  • 11
    @lightalchemist It is not possible. It's kind of a trick to let 'group' know what to choose. – TechWisdom Mar 24 '18 at 23:21
  • 5
    What if instead of count this aggregation was doing a $max or $min for age ? The $first would not necessarily match with the min or max age found for the other fields. How to deal with that then? – Juliomac Jun 01 '18 at 22:48
  • 3
    This does not work, it groups by the other fields which is undesired. – Jack Cole Aug 02 '18 at 17:24
  • 2
    @Juliomac, I believe if your desired output is $max/$min and keeping fields that are not in the `$group` _id, you can `$sort` before with the desired field then group and use `$first` or `$last` operators on any field. When accumulating, the idea to include other fields (which are accumulated/funneled/reduced) doesn't make so much sense even theoretically. However, sorting before hand is actually inefficient compared to sorting each group within themselves since sorting algorithms are more complex than O(n). I wish there would be better ways in MongoDB. – Vemulo May 24 '19 at 08:24
  • Hi, I am using the same aggregate pipeline format, but in case if the field doesnt exits, $first or $last will take the value from the next record for that field and remaining field values will be from the first record, how do we format the query in such a way that all the values are from the same record? – Anoopkr05 Jan 12 '21 at 06:44
  • @MervS: When you are grouping by a field, there might be so many matches. For example you are grouping a collection of customers by zip codes there might be many customers having same zipcode right. Now say you want to keep a field like name or age in aggregation output but as it is grouped, which value will you choose? First, second, third or last? To avoid this dilemma we can suggest $first to show first value. But if you want all the values for a field in aggregation o/p, like say age of all customers matched by a zipcode 60001 use $addtoSet as below age : { $addToSet: '$' }, – Binoy S Kumar Sep 10 '22 at 04:38
55

[edited to include comment suggestions]

I came here looking for an answer but wasn't happy with the selected answer (especially given it's age). I found this answer that is a better solution (adapted):

db.test.aggregate({
  $group: {
    _id: '$name',
   person: { "$first": "$$ROOT" },
   count: { $sum: 1 }
  },
  {
    "$replaceRoot": { "newRoot": { "$mergeObjects": ["$person", { count: "$count" }]} }
  }
}
Pieter
  • 2,188
  • 20
  • 27
  • 5
    BUT, you lose the `count` field. You need use `$mergeObjects` to keep it. – 0zkr PM May 21 '20 at 04:34
  • 4
    To elaborate on 0zkr's comment about using $mergeObjects and help others with the syntax, the last pipeline syntax would be `{"$replaceRoot": {"newRoot": {"$mergeObjects": ["$person", {count: "$count"}]}}}` – Jerren Saunders Sep 14 '20 at 16:32
29

By the way, if you want to keep not only the first document, you can use$addToSet For example:

db.test.aggregate({
  $group: {
    _id: '$name',
    name : { $addToSet: '$name' }
    age : { $addToSet: '$age' },
    count: { $sum: 1 }
  }
}
xueshuo
  • 399
  • 3
  • 5
  • 1
    Thanks! Got it better (avoid messing the order with a Set) : data : {$addToSet: {name: '$name', _id: '$_id', age: '$age' } } – Benoit Jun 19 '19 at 22:59
  • 1
    Worth always bearing in mind when using the `$addToSet` pipeline operator, is that it returns *unique* values. – Andrew Hardiman Jan 27 '21 at 12:16
13

You can try out this

db.test.aggregate({
      { $group: 
            { _id: '$name',count: { $sum: 1 }, data: { $push: '$$ROOT' } } },
      {
        $project: {
          _id:0,
          data:1,
          count :1
        }
      }

}
Deeksha Sharma
  • 3,199
  • 1
  • 19
  • 16
12

Use $first with the $$ROOT document and then use $replaceRoot with the first field.

db.test.aggregate([
  { "$group": {
    "_id": "$name",
    "doc": { "$first": "$$ROOT" }
  }},
  { "$replaceRoot": { "newRoot": "$doc" }}
])
Ashh
  • 44,693
  • 14
  • 105
  • 132
11

Just a quick update if one faces the same issue with documents with numerous fields. One can use the power of combining the $replaceRoot pipeline stage and the $mergeObjects pipeline operator.

db.users.aggregate([
  {
    $group: {
      _id: '$name',
      user: { $first: '$$ROOT' },
      count: { $sum: 1 }
    },
  },
  {
    $replaceRoot: {
      newRoot: { $mergeObjects: [{ count: '$count' }, '$user'] }
    }
  }
])
Gary Wild
  • 601
  • 7
  • 5
  • best answer, thank you, solves my problem. – Abdullah Oladipo Jan 20 '22 at 04:29
  • Keep in mind that in $mergeObjects the order matters, so if you have a field named count in the $$ROOT then better to use it like this: ['$user', { count: '$count' }]. Otherwise it will be overwritten by the original field during the merge. (I know that the example do not indicate this comment but with a different collection this info might be usefull) – Véger Lóránd May 29 '23 at 08:25
5

This is what i did, it works fine.

db.person.aggregate([
{
  $group: { _id: '$name'}, // pass the set of field to be grouped
   age : { $first: '$age' }, // retain remaining field
   count: { $sum: 1 } // count based on your group
},
{
  $project:{
       name:"$_id.name",
       age: "$age",
       count: "$count",
       _id:0 
  }
}])
Thavaprakash Swaminathan
  • 6,226
  • 2
  • 30
  • 31
1

I didn't know about .group helper, but if you prefer to go with the Aggregation Framework, then you'll have to specify which fields to return. Correct me if I'm wrong, but in SQL you would have to do that anyway.

Well, this is how you would do it with the Aggregation Framework mentioned before:

db.test.aggregate({
  $group: {
    _id: { name: "$name", city: "$city", fieldName: "$fieldName" },
    count: { $sum: 1 }
  }
})
gustavohenke
  • 40,997
  • 14
  • 121
  • 129
  • 11
    thanks for you help. in this query is group specified fields, i just want group by one field, and then result others specify fields. any good idea? – plusor May 21 '13 at 05:23
1

I created this function to generalise reversing an unwind stage... let me know if you guys come across any bugs with it, but it's working well for me!

const createReverseUnwindStages = unwoundField => {
  const stages = [
    //
    // Group by the unwound field, pushing each unwound value into an array,
    //
    // Store the data from the first unwound document
    // (which should all be the same apart from the unwound field)
    // on a field called data.
    // This is important, since otherwise we have to specify every field we want to keep individually.
    //
    {
      $group: {
        _id: '$_id',
        data: {$first: '$$ROOT'},
        [unwoundField]: {$push: `$${unwoundField}`},
      },
    },

    //
    // Copy the array of unwound fields resulting from the group into the data object,
    // overwriting the singular unwound value
    //
    {
      $addFields: {[`data.${unwoundField}`]: `$${unwoundField}`},
    },

    //
    // Replace the root with our data object
    //
    {
      $replaceRoot: {
        newRoot: '$data',
      },
    },
  ]

  return stages
}
Matt Wills
  • 676
  • 6
  • 11
1

If you want to project all fields document this use below query.

db.persons.aggregate({
      { $group: { _id: '$name', data: { $push: '$$ROOT' }, total: { $sum: 1 }} },
      {
        $project: {
          _id:0,
          data:1,
          total :1
        }
      }
}
Sameer
  • 509
  • 3
  • 16
0

I like to put everything that is going to be used with the $first option into a dictionary to extract from at the end.

{'$set': 
  {'collection_name':
    'collection_item1': '$collection_item1',
    'collection_item2': '$collection_item2',
    ...
  }
}

Now, just copy the dictionary and you no longer have to lug around all that information 1 at a time!

{'$group':
  '_id': ['$id'],
  'collection_name': {'$first': '$collection_name'}
}
-2

Here is the answer >>>>

    $m = new \MongoDB\Driver\Manager();

    $command = new \MongoDB\Driver\Command([
        'aggregate' => 'mytestusers',
        'pipeline' => [
            ['$match' => ['name' => 'Pankaj Choudhary']],

            ['$unwind'=>'$skills'],
            ['$lookup' => array('from'=>'mytestskills','localField'=>'skills','foreignField'=>'_id','as'=>'sdfg')],
            ['$unwind'=>'$sdfg'],

            ['$group'=>array('_id'=>array('_id'=>'$_id','name'=>'$name','email'=>'$email'),'skills'=>array('$push'=>'$skills'),'sdfg'=>array('$push'=>'$sdfg'))],


        ],
        'cursor' => new \stdClass,
    ]);
    $cursor = $m->executeCommand('targetjob-plus', $command);
    $result = $cursor->toArray();
Pankaj Cheema
  • 1,028
  • 2
  • 13
  • 26