0

I take data from a search box and then insert into MongoDB as a document using the regular insert query. The data is stored in a collection for the word "cancer" in the following format with unique "_id".

{
  "_id": {
    "$oid": "553862fa49aa20a608ee2b7b"
  },
  "0": "c",
  "1": "a",
  "2": "n",
  "3": "c",
  "4": "e",
  "5": "r"
}

Each document has a single word stored in the same format as above. I have many documents as such. Now, I want to remove the duplicate documents from the collection. I am unable to figure out a way to do that. Help me.

José F. Romaniello
  • 13,866
  • 3
  • 36
  • 38
Vamshi
  • 97
  • 10
  • Does http://stackoverflow.com/questions/14184099/fastest-way-to-remove-duplicate-documents-in-mongodb help ? or http://stackoverflow.com/questions/13190370/how-to-remove-duplicates-based-on-a-key-in-mongodb ? – Zee Apr 23 '15 at 09:08
  • No Sourabh. Here, I am confused why the alphabets of a word are being assigned a value. – Vamshi Apr 23 '15 at 09:12
  • 1
    Normally you would do this by making the word the key since that is unique – Sammaye Apr 23 '15 at 09:15
  • Now, I have many number of duplicate documents with same word. How can I remove them? – Vamshi Apr 23 '15 at 09:21

1 Answers1

2

an easy solution in mongo shell: `

use your_db
db.your_collection.createIndex({'1': 1, '2': 1, '3': 1, etc until you reach maximum expected letter count}, {unique: true, dropDups: true, sparse:true, name: 'dropdups'})
db.your_collection.dropIndex('dropdups')

notes:

  • if you have many documents expect this procedure to take very long time
  • be careful this will remove documents in place, better clone your collection first and try it there.
nickmilon
  • 1,332
  • 1
  • 10
  • 9