I found this solution that works with MongoDB 3.4:
I'll assume the field with duplicates is called fieldX
db.collection.aggregate([
{
// only match documents that have this field
// you can omit this stage if you don't have missing fieldX
$match: {"fieldX": {$nin:[null]}}
},
{
$group: { "_id": "$fieldX", "doc" : {"$first": "$$ROOT"}}
},
{
$replaceRoot: { "newRoot": "$doc"}
}
],
{allowDiskUse:true})
Being new to mongoDB, I spent a lot of time and used other lengthy solutions to find and delete duplicates. However, I think this solution is neat and easy to understand.
It works by first matching documents that contain fieldX (I had some documents without this field, and I got one extra empty result).
The next stage groups documents by fieldX, and only inserts the $first document in each group using $$ROOT. Finally, it replaces the whole aggregated group by the document found using $first and $$ROOT.
I had to add allowDiskUse because my collection is large.
You can add this after any number of pipelines, and although the documentation for $first mentions a sort stage prior to using $first, it worked for me without it. " couldnt post a link here, my reputation is less than 10 :( "
You can save the results to a new collection by adding an $out stage...
Alternatively, if one is only interested in a few fields e.g. field1, field2, and not the whole document, in the group stage without replaceRoot:
db.collection.aggregate([
{
// only match documents that have this field
$match: {"fieldX": {$nin:[null]}}
},
{
$group: { "_id": "$fieldX", "field1": {"$first": "$$ROOT.field1"}, "field2": { "$first": "$field2" }}
}
],
{allowDiskUse:true})