1

I have a collection which have many duplicates due to the routines that populated it in the first place. How to dedupe these?

e.g.

   {  "_id" : ObjectId("531a5fe448757e00244096fa"), "code" : "ap", "name" : "[Almost Perfect]", "value" : "[u'*']" }
    { "_id" : ObjectId("531a731148757e17587a6e04"), "code" : "ap", "name" : "[Almost Perfect]", "value" : "[u'*']" }
   {  "_id" : ObjectId("531a7bb848757e1f7c0ca702"), "code" : "ap", "name" : "[Almost Perfect]", "value" : "[u'*']" }

I want it to be just one (don't care which objectID gets picked)

   {  "_id" : ObjectId("531a5fe448757e00244096fa"), "code" : "ap", "name" : "[Almost Perfect]", "value" : "[u'*']" }
user2091936
  • 546
  • 2
  • 7
  • 28
  • Possible duplicates: http://stackoverflow.com/questions/13190370/how-to-remove-duplicates-based-on-a-key-in-mongodb and http://stackoverflow.com/questions/8405331/how-to-remove-duplicate-record-in-mongodb-by-mapreduce – dnl-blkv Mar 08 '14 at 02:18
  • And the method I would choose is adding the unique index. MapReduce output will not be the same as the collection. Even if you don't want the unique index, do that to remove duplicates and then drop the index afterwards. – Neil Lunn Mar 08 '14 at 02:46
  • Possible duplicate of [Fastest way to remove duplicate documents in mongodb](http://stackoverflow.com/questions/14184099/fastest-way-to-remove-duplicate-documents-in-mongodb) – Somnath Muluk Oct 27 '15 at 09:45

1 Answers1

0

You should use an Index over you code field:

db.<collection>.ensureIndex({'code' : 1}, {unique : true, dropDups : true})
  • unique will ensure you will not have duplicates anymore.
  • dropDups will delete all your duplicate documents when the ensureIndex operation is run
CesarTrigo
  • 423
  • 1
  • 3
  • 10