33

I have documents like this:

{
    "_id" : ObjectId("557eaf444ba222d545c3dffc"),
    "foreing" : ObjectId("538726124ba2222c0c0248ae"),
    "value" : "test",
}

I want to find all documents which have duplicated values for pair foreing & value.

chridam
  • 100,957
  • 23
  • 236
  • 235
Haris Hajdarevic
  • 1,535
  • 2
  • 25
  • 39

2 Answers2

43

You can easily identify the duplicates by running the following aggregation pipeline operation:

db.collection.aggregate([
    { 
        "$group": { 
            "_id": { "foreing": "$foreing", "value": "$value" }, 
            "uniqueIds": { "$addToSet": "$_id" },
            "count": { "$sum": 1 } 
        }
    }, 
    { "$match": { "count": { "$gt": 1 } } }
])

The $group operator in the first step is used to group the documents by the foreign and value key values and then create an array of _id values for each of the grouped documents as the uniqueIds field using the $addToSet operator. This gives you an array of unique expression values for each group. Get the total number of grouped documents to use in the later pipeline stages with the $sum operator.

In the second pipeline stage, use the $match operator to filter out all documents with a count of 1. The filtered-out documents represent unique index keys.

The remaining documents will be those in the collection that have duplicate key values for pair foreing & value.

chridam
  • 100,957
  • 23
  • 236
  • 235
  • 1
    Is there a way to return directly the ids please ? I mean, the content of the uniqueIds array. – fallais May 03 '18 at 15:11
  • 3
    Really amazing, in first I got disk error, so I passed 2nd parameter to allow disk `... ,{allowDiskUse:true}` – Qazi Nov 05 '19 at 08:41
25

We only have to group on the bases of 2 keys, and select the elements with count greater than 1, to find the duplicates.

Query :- Will be like

db.mycollection.aggregate(
    { $group: { 
        _id: { foreing: "$foreing", value: "$value" },
        count: { $sum:  1 },
        docs: { $push: "$_id" }
    }},
    { $match: {
        count: { $gt : 1 }
    }}
)

OUTPUT :- Will be like

{
    "result" : [
        {
            "_id" : {
                "foreing" : 1,
                "value" : 2
            },
            "count" : 2,
            "docs" : [
                ObjectId("34567887654345678987"),
                ObjectId("34567887654345678987")
            ]
        }
    ],
    "ok" : 1
}

Reference Link :- How to find mongo documents with a same field

Community
  • 1
  • 1
Nishant
  • 3,614
  • 1
  • 20
  • 26