27

In the following example, "Algorithms in C++" is present twice.

The $unset modifier can remove a particular field but how to remove an entry from a field?

{
  "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), 
  "favorites" : {
    "books" : [
      "Algorithms in C++",    
      "The Art of Computer Programming", 
      "Graph Theory",      
      "Algorithms in C++"
    ]
  }, 
  "name" : "robert"
}
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
P K
  • 9,972
  • 12
  • 53
  • 99

5 Answers5

35

As of MongoDB 2.2 you can use the aggregation framework with an $unwind, $group and $project stage to achieve this:

db.users.aggregate([{$unwind: '$favorites.books'},
                    {$group: {_id: '$_id',
                              books: {$addToSet: '$favorites.books'},
                              name: {$first: '$name'}}},
                    {$project: {'favorites.books': '$books', name: '$name'}}
                   ])

Note the need for the $project to rename the favorites field, since $group aggregate fields cannot be nested.

kynan
  • 13,235
  • 6
  • 79
  • 81
  • That is the right solution if you need to pipe more operators from the aggregation framework (to to statistics for example). Thank you Kynan ! – Michael Rambeau Mar 01 '14 at 05:21
  • 2
    in `$group` stage why are you using `name: {$first: '$name'}`? – Towhid Jul 09 '14 at 19:12
  • @Towhid Because each unwound entry has the same `name`, so you can take any in the `$group` stage, so I'm just taking the first. – kynan Jul 11 '14 at 17:35
  • the problem with this is that unwind can generate really big amount of documents in the pipeline (i just ran into a case that unwind generated 1 million documents), and the group stage memory is limited to 100MB by default, and yea, you could increase the available memory, but it is not always possible, neither desirable. – saljuama Jul 07 '15 at 18:31
  • @SalvadorJuanMartinez in this case you can always go to a full blown map reduce – kynan Jul 08 '15 at 08:56
  • or introduce pipeline optimization phases early using $match stage, but this is not always possible, deppends on the given situation – saljuama Jul 08 '15 at 15:31
20

The easiest solution is to use setUnion (Mongo 2.6+):

db.users.aggregate([
    {'$addFields': {'favorites.books': {'$setUnion': ['$favorites.books', []]}}}
])

Another (more lengthy) version that is based on the idea from @kynan's answer, but preserves all the other fields without explicitly specifying them (Mongo 3.4+):

> db.users.aggregate([
    {'$unwind': {
        'path': '$favorites.books',
        // output the document even if its list of books is empty
        'preserveNullAndEmptyArrays': true
    }},
    {'$group': {
        '_id': '$_id',
        'books': {'$addToSet': '$favorites.books'},
        // arbitrary name that doesn't exist on any document
        '_other_fields': {'$first': '$$ROOT'},
    }},
    {
      // the field, in the resulting document, has the value from the last document merged for the field. (c) docs
      // so the new deduped array value will be used
      '$replaceRoot': {'newRoot': {'$mergeObjects': ['$_other_fields', "$$ROOT"]}}
    },
    // this stage wouldn't be necessary if the field wasn't nested
    {'$addFields': {'favorites.books': '$books'}},
    {'$project': {'_other_fields': 0, 'books': 0}}
])

{ "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), "name" : "robert", "favorites" : 
{ "books" : [ "The Art of Computer Programmning", "Graph Theory", "Algorithms in C++" ] } }    
Dennis Golomazov
  • 16,269
  • 5
  • 73
  • 81
  • I'm a Mongo newbie. I found that I still need to add this aggregate to a updateMany() to actually update the records in the database. Is that the case? – milesmeow Jul 28 '23 at 15:23
  • @milesmeow it's been 5 years, so I don't remember exactly, but I think that the command above should be enough, as it includes `addFields`. Maybe something changed in the API recently. – Dennis Golomazov Jul 31 '23 at 20:06
3

What you have to do is use map reduce to detect and count duplicate tags .. then use $set to replace the entire books based on { "_id" : ObjectId("4f6cd3c47156522f4f45b26f"),

This has been discussed sevel times here .. please seee

Removing duplicate records using MapReduce

Fast way to find duplicates on indexed column in mongodb

http://csanz.posterous.com/look-for-duplicates-using-mongodb-mapreduce

http://www.mongodb.org/display/DOCS/MapReduce

How to remove duplicate record in MongoDB by MapReduce?

Community
  • 1
  • 1
Baba
  • 94,024
  • 28
  • 166
  • 217
2

Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.

For instance, in order to remove duplicates from an array:

// {
//   "favorites" : { "books" : [
//     "Algorithms in C++",
//     "The Art of Computer Programming",
//     "Graph Theory",
//     "Algorithms in C++"
//   ]},
//   "name" : "robert"
// }
db.collection.aggregate(
  { $set:
    { "favorites.books":
      { $function: {
          body: function(books) { return books.filter((v, i, a) => a.indexOf(v) === i) },
          args: ["$favorites.books"],
          lang: "js"
      }}
    }
  }
)
// {
//   "favorites" : { "books" : [
//     "Algorithms in C++",
//     "The Art of Computer Programming",
//     "Graph Theory"
//   ]},
//   "name" : "robert"
// }

This has the advantages of:

  • keeping the original order of the array (if that's not a requirement, then prefer @Dennis Golomazov's $setUnion answer)
  • being more efficient than a combination of expensive $unwind and $group stages.

$function takes 3 parameters:

  • body, which is the function to apply, whose parameter is the array to modify.
  • args, which contains the fields from the record that the body function takes as parameter. In our case "$favorites.books".
  • lang, which is the language in which the body function is written. Only js is currently available.
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
  • 1
    It is worth noting for anyone using MongoDB Atlas free tier that this command is not available on that tier. Still +1'd for the solution. – Jason Holtzen Dec 09 '21 at 03:35
2
function unique(arr) {
    var hash = {}, result = [];
    for (var i = 0, l = arr.length; i < l; ++i) {
        if (!hash.hasOwnProperty(arr[i])) {
            hash[arr[i]] = true;
            result.push(arr[i]);
        }
    }
    return result;
}

db.collection.find({}).forEach(function (doc) {
    db.collection.update({ _id: doc._id }, { $set: { "favorites.books": unique(doc.favorites.books) } });
})
MeVimalkumar
  • 3,192
  • 2
  • 15
  • 26
  • By bringing the logic out of MongoDB (and losing their native optimizations) this is almost guaranteed to be slower. While it may be useful to some (so I won't downvote), I'm sure it's unnecessarily inefficient and complex for many. – Alex L Jan 26 '21 at 16:45