0

I would like to know the best method for removing duplicate documents from a large GeoJSON collection (approx 80k lines) I have stored in MongoDB. I believe the duplicates are causing an error on the front end as I cannot log the full collection to the console.

I have tried to use the dropDups method in the mongo shell as explained in the following url but have had no success.. MongoDB query to remove duplicate documents from a collection . Also I believe dropDups is depreciated as of MongoDB 2.6

Here is a sample of my schema structure:

{
  "type": "FeatureCollection",
     "features": [
        {

           "geometry": {
              "type": "Point","coordinates": [-73.994720, 40.686902]
           }
        },
        {

           "geometry": {
              "type": "Point","coordinates": [-73.994720, 40.686902]
           }
        },
        {

           "geometry": {
              "type": "Point","coordinates": [-73.989205, 40.686675]
           }
        },
        {

           "geometry": {
              "type": "Point","coordinates": [-73.994655, 40.687391]               
           }
        },
        {
           "geometry": {
              "type": "Point","coordinates": [-73.985557, 40.687683]               
           }
        },
        {

           "geometry": {
              "type": "Point","coordinates": [-73.985557, 40.687683]
           }
        },
        {
           "geometry": {
              "type": "Point","coordinates": [-73.984656, 40.685462]
           }
        },

        ]
}

Here is the create index attempts in the mongo shell, and duplicates still remain!

> db.testschema.createIndex( { coordinates: 1 }, { unique: true, dropdups: true } )
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.testschema.createIndex( { geometry: 1 }, { unique: true, dropdups: true      } )
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}
> db.testschema.ensureIndex({'testschema.features.geometry.coordinates': 1}, {unique: true, dropdups: true})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 3,
"numIndexesAfter" : 4,
"ok" : 1
}

Community
  • 1
  • 1
teslorg
  • 11
  • 4
  • The dropdups method should work. Could you post the index you tried to create? – ZeMoon May 14 '15 at 05:48
  • Indexes do not work on subdocument level as such dropdups will only work across documents not across duplicates within subdocuments. As a long shot you could do: `db.c.ensureIndex({'geofield.features.geometry.coordinates': 1}, {unique: true, dropdups: true})` but it will probably remove the document – Sammaye May 14 '15 at 07:14
  • Thanks for the reply! I added the create index attempts to the code above. I am ok with removing the document itself however I am unsure what document you are referring to. Is the document you are referring to the "geometry" document? You can see my attempt above. Thanks in advance – teslorg May 14 '15 at 20:36
  • ...However ideally I would like to remove the duplicate coordinates! A solution for this would be great – teslorg May 14 '15 at 23:29
  • For removing duplicates you can use [this solution](http://stackoverflow.com/a/33364353/1045444) – Somnath Muluk Oct 27 '15 at 09:49

0 Answers0