MongoDB - Unwind array using aggregation and remove duplicates

Question

I am unwinding an array using MongoDB aggregation framework and the array has duplicates and I need to ignore those duplicates while doing a grouping further.

How can I achieve that?

HI andrewrk can you please share schema for above Question and answer — sudeep_dk, Apr 07 '16 at 12:13

score 53 · Answer 1 · edited Sep 14 '15 at 12:40

53

you can use $addToSet to do this:

db.users.aggregate([
  { $unwind: '$data' },
  { $group: { _id: '$_id', data: { $addToSet: '$data' } } }
]);

It's hard to give you more specific answer without seeing your actual query.

edited Sep 14 '15 at 12:40

Preview

35,317
10
92
112

answered Sep 14 '13 at 17:28

Roman Pekar

107,110
28
195
197

i thought of that. I can unwind, addtoset and do unwind again. But wanted to see if i can a different way. Thanks. – l a s Sep 14 '13 at 17:28
2

you can just group by id and array element (without add to set), can't say more without your query – Roman Pekar Sep 14 '13 at 17:34

score 28 · Answer 2 · edited Jul 05 '17 at 21:32

You have to use $addToSet, but at first you have to group by _id, because if you don't you'll get an element per item in the list.

Imagine a collection posts with documents like this:

{
     body: "Lorem Ipsum...", 
     tags: ["stuff", "lorem", "lorem"],
     author: "Enrique Coslado"
}

Imagine you want to calculate the most usual tag per author. You'd make an aggregate query like that:

db.posts.aggregate([
    {$project: {
        author: "$author", 
        tags: "$tags", 
        post_id: "$_id"
    }}, 

    {$unwind: "$tags"}, 

    {$group: {
        _id: "$post_id", 
        author: {$first: "$author"}, 
        tags: {$addToSet: "$tags"}
    }}, 

    {$unwind: "$tags"},

    {$group: {
        _id: {
            author: "$author",
            tags: "$tags"
        },
        count: {$sum: 1}
    }}
])

That way you'll get documents like this:

{
     _id: {
         author: "Enrique Coslado", 
         tags: "lorem"
     },
     count: 1
}

Great answer, I would just add a `{$sort: {count : -1}}` as last stage of the pipeline in order to sort. — Diego, Feb 17 '16 at 20:33

score 3 · Answer 3 · answered Jul 31 '18 at 12:00

Previous answers are correct, but the procedure of doing $unwind -> $group -> $unwind could be simplified. You could use $addFields + $reduce to pass to the pipeline the filtered array which already contains unique entries and then $unwind only once.

Example document:

{
     body: "Lorem Ipsum...", 
     tags: [{title: 'test1'}, {title: 'test2'}, {title: 'test1'}, ],
     author: "First Last name"
}

Query:

db.posts.aggregate([
    {$addFields: {
        "uniqueTag": {
            $reduce: {
                input: "$tags",
                initialValue: [],
                in: {$setUnion: ["$$value", ["$$this.title"]]}
            }
        }
    }}, 

    {$unwind: "$uniqueTag"}, 

    {$group: {
        _id: {
            author: "$author",
            tags: "$uniqueTag"
        },
        count: {$sum: 1}
    }}
])

This answers the question I was going to ask about how to remove duplicate array entries in a document field. I knew I could use $unwind/$addToSet but it just feels that the step would duplicate so much data it's going to be slower than just getting everything and doing the aggregation in code. — T Tse, Sep 21 '20 at 04:14

MongoDB - Unwind array using aggregation and remove duplicates

3 Answers3

Linked