21

I have created a collection and added a unique key like this

db.user_services.createIndex({"uid":1 , "sid": 1},{unique:true,dropDups: true})

The collection looks something like this "user_services"

{
 "_id" : ObjectId("55068b35f791c7f81000002d"),
 "uid" : 15,
 "sid" : 1,
 "rate" : 5
},
{

 "_id" : ObjectId("55068b35f791c7f81000002f"),
 "uid" : 15,
 "sid" : 1,
 "rate" : 4
}

Problem :

Am using php driver to insert documents with same uid and sid and it is getting inserted.

What I want

  1. On Mongo Shell : Add unique key on uid and sid with no duplicate documents with the same uid and sid.
  2. On PHP Side : having something like mysql "insert (value) on duplicate key update rate=rate+1". That is whenever I try to insert a document, it should be inserted if not there else it should update the rate field of the document
Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Raj Nandan Sharma
  • 3,694
  • 3
  • 32
  • 42

4 Answers4

38

Congratulations, you appear to have found a bug. This only happens with MongoDB 3.0.0 in my testing, or at least is not present at MongoDB 2.6.6. Bug now recorded at SERVER-17599

NOTE: Not actually an "issue" but confirmed "by design". Dropped the option for version 3.0.0. Still listed in the documentation though.

The problem is that the index is not being created and errors when you attempt to create this on a collection with existing duplicates on the "compound key" fields. On the above, the index creation should yield this in the shell:

{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 1,
    "errmsg" : "exception: E11000 duplicate key error dup key: { : 15.0, : 1.0 }",
    "code" : 11000,
    "ok" : 0
}

When there are no duplicates present you can create the index as you are currently trying and it will be created.

So to work around this, first remove the duplicates with a procedure like this:

db.events.aggregate([
    { "$group": {
        "_id": { "uid": "$uid", "sid": "$sid" },
        "dups": { "$push": "$_id" },
        "count": { "$sum": 1 }
    }},
    { "$match": { "count": { "$gt": 1 } }}
]).forEach(function(doc) {
    doc.dups.shift();
    db.events.remove({ "_id": {"$in": doc.dups }});
});

db.events.createIndex({"uid":1 , "sid": 1},{unique:true})

Then further inserts containing duplicate data will not be inserted and the appropriate error will be recorded.

The final note here is that "dropDups" is/was not a very elegant solution for removing duplicate data. You really want something with more control as demonstrated above.

For the second part, rather than use .insert() use the .update() method. It has an "upsert" option

$collection->update(
    array( "uid" => 1, "sid" => 1 ),
    array( '$set' => $someData ),
    array( 'upsert' => true )
);

So the "found" documents are "modified" and the documents not found are "inserted". Also see $setOnInsert for a way to only create certain data when the document is actually inserted and not when modified.


For your specific attempt, the correct syntax of .update() is three arguments. "query", "update" and "options":

$collection->update(
    array( "uid" => 1, "sid" => 1 ),
    array(
        '$set' => array( "field" => "this" ),
        '$inc' => array( "counter" => 1 ),
        '$setOnInsert' => array( "newField" => "another" )
   ),
   array( "upsert" => true )
);

None of the update operations are allowed to "access the same path" as used in another update operation in that "update" document section.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • 2
    The dropDups option was [removed in MongoDB 3.0](http://docs.mongodb.org/manual/release-notes/3.0-compatibility/#remove-dropdups-option). See also: [SERVER-14710](https://jira.mongodb.org/browse/SERVER-14710). Non-deterministic deletion of data was an unexpected side effect of an index option. – Stennie Mar 16 '15 at 09:08
  • @Stennie. Makes sense. Never used it myself for that very reason. Still in the [documentation](http://docs.mongodb.org/manual/tutorial/create-a-unique-index/) though which would be the source of the problem. – Neil Lunn Mar 16 '15 at 09:10
  • Thanks, good catch on the docs! I'll [raise a Jira](https://jira.mongodb.org/browse/DOCS) for it :) – Stennie Mar 16 '15 at 09:15
  • thanks.. it worked.. can u also solve the second part of my problem... can i update a duplicate entry in a single query – Raj Nandan Sharma Mar 16 '15 at 09:17
  • @Raj Got caught up in all that. What you really want is an "upsert". – Neil Lunn Mar 16 '15 at 09:19
  • Found several lingering references to dropDups (will grep the 3.0 manual source to find any missing). Docs ticket FYI: https://jira.mongodb.org/browse/DOCS-5012 – Stennie Mar 16 '15 at 09:23
  • @Raj Updated with example. – Neil Lunn Mar 16 '15 at 09:27
  • @NeilLunn I tried this but mongodb does not allow it – Raj Nandan Sharma Mar 16 '15 at 10:50
  • $collection->update( array( "uid" => 15, "sid" => 1 ), array( '$set' => array('trate' => 2) ), array( '$setOnInsert' => array('trate' => 0) ), array( 'upsert' => true ) ); – Raj Nandan Sharma Mar 16 '15 at 10:51
  • @Raj See the included example for an explanation. Differing values here do not make any sense in this context. You either want to update with present values on some fields or only set certain fields on insertion an do not touch them on update. – Neil Lunn Mar 16 '15 at 11:08
  • 1
    @NeilLunn: There should be a small correction in the aggregation query's last part: db.events.remove({"_id" : { "$in": doc.dups }}); – Yadvendar Sep 22 '15 at 05:35
  • That last part didnt work for me so heres what I did.. .forEach(function(doc) { u_id = doc.dups.shift();//Get the id printjson(u_id) db.followers.remove({ "_id": u_id });//Remove duplicate by id }); – Leon Oct 20 '15 at 15:46
24

I feel like the current most popular answer is a little too local and detailed for such an elementary MongoDB operation - removing duplicates from mongo by a key.

Removing duplicates by a key for mongo > 3.0 is simple. Just run this query, replacing yourDuplicateKey and assuming _id is your primary key (make sure you mongodump just in case):

db.yourCollection.aggregate([
    { "$group": {
        "_id": { "yourDuplicateKey": "$yourDuplicateKey" },
        "dups": { "$push": "$_id" },
        "count": { "$sum": 1 }
    }},
    { "$match": { "count": { "$gt": 1 } }}
]).forEach(function(doc) {
    doc.dups.shift();
    db.yourCollection.remove({ "_id": {"$in": doc.dups }});
});
chakeda
  • 1,551
  • 1
  • 18
  • 40
  • Solid answer! And also opened my mind about how to think about MongoDB queries. Mongo gets a lot of flak but the ability to write code like this in the shell is incredible. – aeskreis Aug 13 '20 at 19:04
  • I know I was being stupid, but should also mention `yourCollection` needs to be replaced. Also got this warning: `'DeprecationWarning: Collection.remove() is deprecated. Use deleteOne, deleteMany, findOneAndDelete, or bulkWrite.'` – Itay Dec 28 '22 at 12:43
0

Another Simple way of avoiding duplicate records using multiple values

Example : Using the following code one can avoid duplicate values for the fields "Student Name" and "Parent Name"

    $DataForDB = array( "AdmissionNo" => $admissionNo, 
    "StudentName" => $StudentName, "ParentName" => $ParentName);
    if(empty($Coll->findOne(array("StudenName" => $StudentName, "ParentName" => $ParentName)))){
    $Coll->insertOne($DataForDB);
    }

In this case we are checking wether the document with the following fields exists or not if it exist the data is not entered in the DB if it does not exists the data is entered.

Tech guy
  • 125
  • 1
  • 8
0

In my case fairly large collection, I added { allowDiskUse: true } to make it work.

 db.yourCollection.aggregate([
{ "$group": {
    "_id": { "yourDuplicateKey": "$yourDuplicateKey" },
    "dups": { "$push": "$_id" },
    "count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } }}
], { allowDiskUse: true } )
.forEach(function(doc) {
    doc.dups.shift();
    db.yourCollection.remove({ "_id": {"$in": doc.dups }});
});
kriscondev
  • 765
  • 7
  • 21