I hope no one is using the accepted answer from Mariusz here because it doesn't work, at least in couchDB
CouchDB reduce functions also need to perform rereduces. That is reducing the output of several other reduces.
Typical solution
Make your map function output a unique key, and then just reduce with _count. Exactly what you suggested in your question, except group=true.
This will count how many instances of each unique thing you have. Each row will represent a unique thing. You can easily count the total rows in a list function.
Alternatively
You may not wish to make the key unique eg you might have time series data, and wish to query the unique values within a certain time range, then you have to include the datetime in the key.
To handle this case is tricky.
Option 1:
The naive solution is to not count the unique values but just make a big list of unique values a bit like this, and then count them all in the client, or in a list function afterwards.
function (keys, values, rereduce) {
var unique = {};
var getUniqueValues = function(values) {
for (i = 0; i < values.length; i++) {
if (values[i] in unique) {
} else {
unique[values[i]] = null;
}
}
}
if (rereduce === true) {
for (j = 0; j < values.length; j++) {
getUniqueValues(values[j]);
};
return Object.keys(unique);
} else {
getUniqueValues(values);
return Object.keys(unique);
}
}
Option 2:
Another option is not reducing at all, and just counting the unique values in a list function. As you say, this can get slow when there are lots of values.
Option 3:
To avoid using excessive memory when counting a large number of unique things is tricky.
It can be done by hashing the unique value to a bit on a bit map.
Then counting how many 1's there are in the final bitmap.
This also lets you use a reduce function, because you can combine bitmaps to combine your unique results. Then finally in the client or in list function count the 1's in the bitmap.
I haven't tried this in couchdb yet, but the theory is sound: http://highscalability.com/blog/2012/4/5/big-data-counting-how-to-count-a-billion-distinct-objects-us.html
The one caveat is that there may be a small error if the bitmap is not large enough. However when you are counting very large amounts, a small error is often acceptable.