So when your syntax is corrected from the incorrect usage to:
db.dummy_data.ensureIndex({ "url": 1},{ "unique": true, "dropDups": true })
You report that you still get an error message, but a new one:
{ "connectionId" : 336, "err" : "too may dups on index build with dropDups=true", "code" : 10092, "n" : 0, "ok" : 1 }
There is this message on google groups which leads to the suggested method:
Hi Daniel,
The assertion indicates that the number of duplicates met or exceeded 1000000. In addition, there's a comment in the source that says, "we could queue these on disk, but normally there are very few dups, so instead we keep in ram and have a limit." (where the limit == 1000000), so it might be best to start with an empty collection, ensureIndex with {dropDups: true}, and reimport the actual documents.
Let us know if that works better for you.
So as that suggests, create a new collection and import everything in there. Basic premise:
db.newdata.ensureIndex({ "url": 1},{ "unique": true, "dropDups": true });
db.dummy_data.find().forEach(function(doc) {
db.newdata.insert(doc);
});
Or better yet:
db.newdata.ensureIndex({ "url": 1},{ "unique": true, "dropDups": true });
var bulk = db.newdata.initializeUnOrderedBulkOp();
var counter = 0;
db.dummy_data.find().forEach(function(doc) {
counter++;
bulk.insert( doc );
if ( counter % 1000 == 0 ) {
bulk.execute();
bulk = db.newdata.initializeUnOrderedBulkOp();
}
});
if ( counter % 1000 != 0 )
bulk.execute();
However you approach the migration from one collection to another, with a high volume of duplicates on a unique key this seems to be the only way of handling it at present.