6

i am running the query below in robomongo. bot it is giving an error as shown below? I am really trying to remove the duplcate enties in url field using this query. Is there any problem with my query?

db.dummy_data.createIndex({"url":1},{unique:true},{dropDups:true})

My error is E11000 duplicate key error index: mydb.dummy_data.$url_1 dup key: {"some url"}

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Juhan
  • 1,283
  • 2
  • 11
  • 30
  • 1
    Well this is JavaScript and not Java. Two completely different things. But your syntax is wrong. Should be `db.dummy_data.ensureIndex({ "url": 1},{ "unique": true, "dropDups: true })` and that is why the third object is being ignored. – Neil Lunn Jan 23 '15 at 10:46
  • But it give this error "connectionId" : 336, "err" : "E11000 duplicate key error index: client_mahout.dummy_data.$product_url_1 dup key: { : \"http://dl.google.com/dl/2010kharido-back-cover-apple-ipad-air-2-2nd-gen/p/itme356uphtpyfhc?pid=ACCE356U6DZFUJBX\" }", "code" : 11000, "n" : 0, "ok" : 1 – Juhan Jan 23 '15 at 10:52
  • 1
    Beware that dropping the duplicates is arbitrary. – nothing Jan 23 '15 at 10:53
  • Typo which you should have spotted: `db.dummy_data.ensureIndex({ "url": 1},{ "unique": true, "dropDups": true })`. See: http://docs.mongodb.org/manual/reference/method/db.collection.ensureIndex/ – Neil Lunn Jan 23 '15 at 10:53
  • Also what @nothing said. You have no control over what gets dropped. – Neil Lunn Jan 23 '15 at 10:54
  • @Neil Lunn I went through it and now i am getting E11000 duplicate key error index: client_mahout.dummy_data.$product_url_1 dup key:{"someurl"} "code" : 11000, "n" : 0, "ok" : 1 error – Juhan Jan 23 '15 at 10:56
  • @CharlotteEden You are still doing something wrong. This works everywhere. Perhaps update your question to show what you are actually doing now. – Neil Lunn Jan 23 '15 at 10:58
  • @Neil Lunn i just updated my qury as db.dummy_data.ensureIndex({"product_url":1},{unique:true,dropDups:true}) the it show this { "connectionId" : 336, "err" : "too may dups on index build with dropDups=true", "code" : 10092, "n" : 0, "ok" : 1 } How do copp with this thing? – Juhan Jan 23 '15 at 11:06
  • @CharlotteEden Then that is a new error that I have not seen before But to avoid confusion with your original question, then you should really post that as another question to make the distinction clear. The error you experienced originally is a result of not issuing the command with the correct syntax. – Neil Lunn Jan 23 '15 at 11:20
  • Actually hold that. Let's include everything for a definitive answer. – Neil Lunn Jan 23 '15 at 11:25
  • For others having this problem, check your mongo version with `db.version()`. If you are running Mongo 3 and are trying to use `dropDups` to clear duplicates, it will ignore dropDups and give you this error. – Muhd Jun 11 '15 at 14:27

1 Answers1

3

So when your syntax is corrected from the incorrect usage to:

db.dummy_data.ensureIndex({ "url": 1},{ "unique": true, "dropDups": true })

You report that you still get an error message, but a new one:

{ "connectionId" : 336, "err" : "too may dups on index build with dropDups=true", "code" : 10092, "n" : 0, "ok" : 1 }

There is this message on google groups which leads to the suggested method:

Hi Daniel,

The assertion indicates that the number of duplicates met or exceeded 1000000. In addition, there's a comment in the source that says, "we could queue these on disk, but normally there are very few dups, so instead we keep in ram and have a limit." (where the limit == 1000000), so it might be best to start with an empty collection, ensureIndex with {dropDups: true}, and reimport the actual documents.

Let us know if that works better for you.

So as that suggests, create a new collection and import everything in there. Basic premise:

db.newdata.ensureIndex({ "url": 1},{ "unique": true, "dropDups": true });

db.dummy_data.find().forEach(function(doc) {
    db.newdata.insert(doc);
});

Or better yet:

db.newdata.ensureIndex({ "url": 1},{ "unique": true, "dropDups": true });

var bulk = db.newdata.initializeUnOrderedBulkOp();
var counter = 0;

db.dummy_data.find().forEach(function(doc) {
    counter++;
    bulk.insert( doc );

    if ( counter % 1000 == 0 ) {
        bulk.execute();
        bulk = db.newdata.initializeUnOrderedBulkOp();
    }
});

if ( counter % 1000 != 0 )
    bulk.execute();

However you approach the migration from one collection to another, with a high volume of duplicates on a unique key this seems to be the only way of handling it at present.

Community
  • 1
  • 1
Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • 2
    FYI, the [`dropDups`](http://docs.mongodb.org/manual/core/index-creation/#index-creation-duplicate-dropping) option only applies when building a unique index on a collection with existing data. If you are copying into a new collection the normal duplicate key exceptions are raised (which you can choose to ignore). It's also worth noting that `dropDups` support was removed during the 2.7.x development cycle as per [SERVER-14710](https://jira.mongodb.org/browse/SERVER-14710). Non-deterministic data deletion can lead to some unexpected consequences (particularly with typos or usage errors). – Stennie Jan 23 '15 at 13:11
  • @Stennie So where's the up-vote for providing a practical solution to the problem? – Neil Lunn Jan 23 '15 at 13:13