I'm trying to get a list of the number of records that have arrays of varying size. I want to get the distribution of array sizes for all records so I can build a histogram like this:
| *
| *
documents | * *
| * * *
|_*__*__*___*__*___
2 5 6 23 47
Array Size
So the raw documents look something like this:
{hubs : [{stuff:0, id:6}, {stuff:1"}, .... ]}
{hubs : [{stuff:0, id:6}]}`
So far using the aggregation framework and some of the help here I've come up with
db.sitedata.aggregate([{ $unwind:'$hubs'},
{ $group : {_id:'$_id', count:{$sum:1}}},
{ $group : {_id:'$count', count:{$sum:1}}},
{ $sort : {_id: 1}}])
This seems to give me the results I want, but it's not very fast. I'm wondering if there is something I can do like this that may not need two group calls. The syntax is wrong here, but what I'm trying to do is put the count value in the first _id field:
db.sitedata.aggregate([{ $unwind:'$hubs'},
{ $group : {_id:{$count:$hubs}, count:1}},
{ $sort : { _id: 1 }}])