Is there a good way to mimic the behavior of SELECT COUNT(DISTINCT field) in CouchDB?
Imagine we have the following document, which records the time at which a user played a certain song:
{
song_id: "happy birthday",
user_id: "boris",
date_played: [2011, 11, 14, 00, 12, 55],
_id: ...
}
I'd like to know the number of distinct songs ever played by our user "boris". If our user has listened to "happy birthday" 20 times, that song should still contribute just +1 to the overall song count.
In MySQL, I'd simply execute SELECT COUNT(DISTINCT song_id) FROM plays WHERE user_id = "boris"
, but I'm drawing a blank when it comes to writing this in CouchDB.
Work-Around 1: If I changed my schema and instead stored all the song-plays inside a single user document for "boris" I could then write a map to emit only distinct values. However, if I wanted to build something on the scale of last.fm, my fear is that updates would start taking a very long time as the "boris" document size (number of plays) continued to grow. (There might also be a maximum document size that I would eventually hit).
Work-Around 2: I could also write a map function to return all of the distinct records, which my python script could sum up itself; but again with hundreds of thousands of distinct songs this would become very slow as well.
What other options am I missing?