Mongo Distinct Query with full row object

Question

first of all i'm new to mongo so I don't know much and i cannot just remove duplicate rows due to some dependencies.

I have following data stored in mongo

{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 2, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'},
{'id': 5, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}

you can see some of the rows are duplicate with different id as long as it will take to solve this issue from input I must tackle it on output.

I need the data in the following way:

{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}

My query

keys = db.collection.distinct('key', {})
all_data = db.collection.find({'key': {$in: keys}})

As you can see it takes two queries for a same result set Please combine it to one as the database is very large

I might also create a unique key on the key but the value is so long (152 characters) that it will not help me.

Or it will??

Possible duplicate of [Remove Duplicates from MongoDB](http://stackoverflow.com/questions/31557053/remove-duplicates-from-mongodb) — Blakes Seven, Mar 08 '16 at 21:14
Maybe not "exactly" in the "removal" part, but certainly in the identification part. Basically you need to `$group` on the common field(s) and then return any other ancillary fields with `$first` — Blakes Seven, Mar 08 '16 at 21:15

score 5 · Accepted Answer · answered Mar 08 '16 at 21:15

You need to use the aggregation framework for this. There are multiple ways to do this, the solution below uses the $$ROOT variable to get the first document for each group:

db.data.aggregate([{
  "$sort": {
    "_id": 1
  }
}, {
  "$group": {
    "_id": "$key",
    "first": {
      "$first": "$$ROOT"
    }
  }
}, {
  "$project": {
    "_id": 0,
    "id":"$first.id",
    "key":"$first.key",
    "name":"$first.name",
    "country":"$first.country"
  }
}])

Mongo Distinct Query with full row object

1 Answers1