I have a mongodb of about 400gb. The documents contain a variety of fields, but the key here is an array of IDs.
So a json file might look like this
{
"name":"bob"
"dob":"1/1/2011"
"key":
[
"1020123123",
"1234123222",
"5021297723"
]
}
The focal variable here is "key". There is about 10 billion total keys across 50 million documents (so each document has about 200 keys). Keys can repeat, and there are about 15 million UNIQUE keys.
What I would like to do is return the 10,000 most common keys. I thought aggregate might do this, but I'm having a lot of trouble getting it to run. Here is my code:
db.users.aggregate(
[
{ $unwind : "$key" },
{ $group : { _id : "$key", number : { $sum : 1 } } },
{ $sort : { number : -1 } },
{ $limit : 10000 }
]
);
Any ideas what I'm doing wrong?