3

I'm trying to get a list of unique values from the 'type' field from my mongodb collection. Example documents below:

{
       "_id" : ...,
       "type" : "report",
       "tasks" : ...
}
{
       "_id" : ...,
       "type" : "research",
       "tasks" : ...
}
{
       "_id" : ...,
       "type" : "memo",
       "tasks" : ...
}
{
       "_id" : ...,
       "type" : "memo",
       "tasks" : ...
}
{
       "_id" : ...,
       "type" : "report",
       "tasks" : ...
}
{
       "_id" : ...,
       "type" : "report",
       "tasks" : ...
}

I'm looking for, ordered by frequency, the unique types that are in the type field of the documents, so:

["report", "memo", "research"]

What's the best way to do this? Hopefully I can do this by querying with mongo and not downloading the entire collection...

Gates VP
  • 44,957
  • 11
  • 105
  • 108
Mark
  • 32,293
  • 33
  • 107
  • 137

2 Answers2

11

On a standard SQL DBMS this would be done with the following query:

SELECT type, count(*) as ct FROM table GROUP BY type ORDER BY ct;

on mongodb this would be done using the group function, although it's slightly more complicated:

db.collection.group(
           {key: { "type":true},
            reduce: function(obj,prev) { prev.count += 1; },
            initial: { count: 0 }
            });

Here I'm asking the db to return values for the key "type" (hence the "true"), and for each value, the given reduce function will be used to aggregate the found records. Here I'm just updating a count of how many times each record appears. If you run this query you'll get something like this:

[
    {
        "type" : "report",
        "count" : 5
    },
    {
        "type" : "memo",
        "count" : 15
    }
    {
        "type" : "research",
        "count" : 3
    }

]

You'll notice this is not ordered; even the mongodb docs say that the easiest way to order it is to do it client-side.

Relevant documentation is here.

Roadmaster
  • 5,297
  • 1
  • 23
  • 21
  • This answer should work. Just some extra notes.This will be a slow query unless `type` is indexed. Even with an index, you're essentially going to have to "walk" the entire index. If this is a time-sensitive query, then this should be set up as a map-reduce and run on a schedule. – Gates VP Nov 28 '10 at 19:49
  • Gates VP is entirely right, be mindful of performance issues. A possible workaround is to keep a cache of the counts for each possible value of "type" (assuming you have them in another collection), and update this counter every time a record is added or removed. Means a small performance hit on adding/deleting records but if you need frequent access to your types and counts it will save time in the long run. – Roadmaster Nov 28 '10 at 20:15
  • Forgot to say thanks, thanks, this is exactly what i wanted. +1 – Mark Dec 01 '10 at 16:40
2

You can use distinct : http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct

There is an example in the php doc : http://php.net/manual/en/mongodb.command.php

$types = $db->command(array("distinct" => "yourCollection", "key" => "type"));

foreach ($types['values'] as $type) {
    echo "$type\n";
}

I don't know if the results are ordered by frequency.

Maxence
  • 12,868
  • 5
  • 57
  • 69
  • Would still need a way to obtain the frequencies for each unique value in order to sort them by frequency. I found no way to do that using distinct, hence the slightly more involved solution with group. – Roadmaster Nov 28 '10 at 19:05