76

Let us have a MongoDB collection which has three docs..

db.collection.find()

 { _id:'...', user: 'A', title: 'Physics',   Bank: 'Bank_A' }
 { _id:'...', user: 'A', title: 'Chemistry', Bank: 'Bank_B' }
 { _id:'...', user: 'B', title: 'Chemistry', Bank: 'Bank_A' }

We have a doc,

 doc = { user: 'B', title: 'Chemistry', Bank:'Bank_A' }

If we use

 db.collection.insert(doc) 

here, this duplicate doc will get inserted in database.

 { _id:'...', user: 'A', title: 'Physics',   Bank: 'Bank_A' }
 { _id:'...', user: 'A', title: 'Chemistry', Bank: 'Bank_B' }
 { _id:'...', user: 'B', title: 'Chemistry', Bank: 'Bank_A' }
 { _id:'...', user: 'B', title: 'Chemistry', Bank: 'Bank_A' }

How this duplicate can be stopped. On which field should indexing be done or any other approach?

Community
  • 1
  • 1
shashank
  • 947
  • 2
  • 7
  • 8
  • 2
    possible duplicate of [Mongodb avoid duplicate entries](http://stackoverflow.com/questions/12191311/mongodb-avoid-duplicate-entries) – John Petrone Jun 09 '14 at 15:06
  • @John Petrone : here, we can not do indexing on a particular field, as value of fields are repeating. {unique:true} will create problem. – shashank Jun 09 '14 at 15:13
  • Use a compound index http://docs.mongodb.org/manual/tutorial/create-a-compound-index/ – John Petrone Jun 09 '14 at 15:15

6 Answers6

69

Don't use insert.

Use update with upsert=true. Update will look for the document that matches your query, then it will modify the fields you want and then, you can tell it upsert:True if you want to insert if no document matches your query.

db.collection.update(
   <query>,
   <update>,
  {
    upsert: <boolean>,
     multi: <boolean>,
    writeConcern: <document>
   }
  )

So, for your example, you could use something like this:

db.collection.update(doc, doc, {upsert:true})
d-_-b
  • 21,536
  • 40
  • 150
  • 256
Vic
  • 766
  • 5
  • 6
  • 10
    Be aware that upsert can be dangerous if you expect to be priming a document for first use, since it will happily wipe out anything that is stored there, in favor of the upserted values. The pattern that John P. proposed is a better general answer for avoiding duplicate records, though either approach would be adequate for the simple case presented where the compound key would include all existing document fields. If you consider the very commonly needed addition of a 'creation time' for the record, you can see how this breaks down for many general cases... – J. Paulding Jul 28 '17 at 17:58
  • 1
    I humbly request everyone to refer John Petrone's answer too !! – Arun Aug 20 '20 at 13:54
65

You should use a compound index on the set of fields that uniquely identify a document within your MongoDB collection. For example, if you decide that the combination of user, title and Bank are your unique key you would issue the following command:

db.collection.createIndex( { user: 1, title: 1, Bank: 1 }, {unique:true} )

Please note that this should be done after you have removed previously stored duplicates.

http://docs.mongodb.org/manual/tutorial/create-a-compound-index/

http://docs.mongodb.org/manual/tutorial/create-a-unique-index/

Vorticity
  • 4,582
  • 4
  • 32
  • 49
John Petrone
  • 26,943
  • 6
  • 63
  • 68
  • @Roberto ensureIndex creates the index if it doesn't exist already – Azmisov Dec 13 '16 at 21:45
  • 1
    Oh, you're right @Azmisov, but ensureIndex was actually deprecated since 3.0, now it's a (deprecated) alias for createIndex, as 3.0 did not exist when this answer was created I'm removing my downvote :) (it doesn't allow me to undo my downvote until the answer is edited, let's hope John Petrone adds this as an update... – Roberto Dec 13 '16 at 22:01
  • Thank you this is exactly what I needed and confirmed working after trying to insert the same item. It now throws an error properly. – james-see Mar 27 '23 at 23:58
7

It has been updated from the above answers.

please use db.collection.updateOne() instead of db.collection.update(). and also db.collection.createIndexes() instead of db.collection.ensureIndex()

Update: the methods update() and ensureIndex() has been deprecated from mongodb 2.*, you can see more details in mongo and the path is ./mongodb/lib/collection.js. For update(), the recommend methods are updateOne, updateMany, or bulkWrite. For ensureIndex(), the recommend method is createIndexes.

Creem
  • 141
  • 2
  • 5
1

Maybe this is a bit slower than other ways but it works too. It can be used inside a loop:

db.collection.replaceOne(query, data, {upsert: true})

The query may be something like:

{ _id: '5f915390950f276680720b57' }

https://docs.mongodb.com/manual/reference/method/db.collection.replaceOne

Banzy
  • 1,590
  • 15
  • 14
1

What you are looking for is the AddToSet instead of Push or Insert. Using the Upsert flag dosen't seem to work for me.

ie: var updateSet = Builders<T>.Update.AddToSet(collectionField, value);

Note that AddToSet seems to do a value comparison.

John Doe
  • 11
  • 1
  • 1
0

setting your document's _id key to be the unique identifier and using collection.insert_many(documents, ordered=False) will both allow you to bulk insert and simultaneously prevent duplicates.

eg.

documents = [{'_id':'hello'}, {'_id':'world'}, {'_id':'hello'}]

collection.insert_many(documents, ordered=False)

ordered=False is important. according to the documentation, if ordered=True then mongo will stop attempting to insert if it encounters an duplicate _id. If ordered=False, mongo will attempt to insert all documents.