47

I am unwinding an array using MongoDB aggregation framework and the array has duplicates and I need to ignore those duplicates while doing a grouping further.

How can I achieve that?

andrewrk
  • 30,272
  • 27
  • 92
  • 113
l a s
  • 3,836
  • 10
  • 42
  • 61

3 Answers3

53

you can use $addToSet to do this:

db.users.aggregate([
  { $unwind: '$data' },
  { $group: { _id: '$_id', data: { $addToSet: '$data' } } }
]);

It's hard to give you more specific answer without seeing your actual query.

Preview
  • 35,317
  • 10
  • 92
  • 112
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
  • i thought of that. I can unwind, addtoset and do unwind again. But wanted to see if i can a different way. Thanks. – l a s Sep 14 '13 at 17:28
  • 2
    you can just group by id and array element (without add to set), can't say more without your query – Roman Pekar Sep 14 '13 at 17:34
28

You have to use $addToSet, but at first you have to group by _id, because if you don't you'll get an element per item in the list.

Imagine a collection posts with documents like this:

{
     body: "Lorem Ipsum...", 
     tags: ["stuff", "lorem", "lorem"],
     author: "Enrique Coslado"
}

Imagine you want to calculate the most usual tag per author. You'd make an aggregate query like that:

db.posts.aggregate([
    {$project: {
        author: "$author", 
        tags: "$tags", 
        post_id: "$_id"
    }}, 

    {$unwind: "$tags"}, 

    {$group: {
        _id: "$post_id", 
        author: {$first: "$author"}, 
        tags: {$addToSet: "$tags"}
    }}, 

    {$unwind: "$tags"},

    {$group: {
        _id: {
            author: "$author",
            tags: "$tags"
        },
        count: {$sum: 1}
    }}
])

That way you'll get documents like this:

{
     _id: {
         author: "Enrique Coslado", 
         tags: "lorem"
     },
     count: 1
}
NatNgs
  • 874
  • 14
  • 25
Enrique Coslado
  • 281
  • 3
  • 4
  • 1
    Great answer, I would just add a `{$sort: {count : -1}}` as last stage of the pipeline in order to sort. – Diego Feb 17 '16 at 20:33
3

Previous answers are correct, but the procedure of doing $unwind -> $group -> $unwind could be simplified. You could use $addFields + $reduce to pass to the pipeline the filtered array which already contains unique entries and then $unwind only once.

Example document:

{
     body: "Lorem Ipsum...", 
     tags: [{title: 'test1'}, {title: 'test2'}, {title: 'test1'}, ],
     author: "First Last name"
}

Query:

db.posts.aggregate([
    {$addFields: {
        "uniqueTag": {
            $reduce: {
                input: "$tags",
                initialValue: [],
                in: {$setUnion: ["$$value", ["$$this.title"]]}
            }
        }
    }}, 

    {$unwind: "$uniqueTag"}, 

    {$group: {
        _id: {
            author: "$author",
            tags: "$uniqueTag"
        },
        count: {$sum: 1}
    }}
])
cephuo
  • 141
  • 7
  • This answers the question I was going to ask about how to remove duplicate array entries in a document field. I knew I could use $unwind/$addToSet but it just feels that the step would duplicate so much data it's going to be slower than just getting everything and doing the aggregation in code. – T Tse Sep 21 '20 at 04:14
  • This is an incredibly expensive operation. – chovy Dec 27 '20 at 08:25