I would like to know the best method for removing duplicate documents from a large GeoJSON collection (approx 80k lines) I have stored in MongoDB. I believe the duplicates are causing an error on the front end as I cannot log the full collection to the console.
I have tried to use the dropDups method in the mongo shell as explained in the following url but have had no success.. MongoDB query to remove duplicate documents from a collection . Also I believe dropDups is depreciated as of MongoDB 2.6
Here is a sample of my schema structure:
{
"type": "FeatureCollection",
"features": [
{
"geometry": {
"type": "Point","coordinates": [-73.994720, 40.686902]
}
},
{
"geometry": {
"type": "Point","coordinates": [-73.994720, 40.686902]
}
},
{
"geometry": {
"type": "Point","coordinates": [-73.989205, 40.686675]
}
},
{
"geometry": {
"type": "Point","coordinates": [-73.994655, 40.687391]
}
},
{
"geometry": {
"type": "Point","coordinates": [-73.985557, 40.687683]
}
},
{
"geometry": {
"type": "Point","coordinates": [-73.985557, 40.687683]
}
},
{
"geometry": {
"type": "Point","coordinates": [-73.984656, 40.685462]
}
},
]
}
Here is the create index attempts in the mongo shell, and duplicates still remain!
> db.testschema.createIndex( { coordinates: 1 }, { unique: true, dropdups: true } )
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.testschema.createIndex( { geometry: 1 }, { unique: true, dropdups: true } )
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}
> db.testschema.ensureIndex({'testschema.features.geometry.coordinates': 1}, {unique: true, dropdups: true})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 3,
"numIndexesAfter" : 4,
"ok" : 1
}