Count is an expensive thing in RethinkDB to my experience. Especially for count operation that require iterating the whole data set. I myself struggle with this for a bit before.
To my understanding, when you pass groupData to distinct
, it creates an array, because groupData
will be a sequence, therefore it has 100,000 element limits.
To solve this, I think we have to use a stream, and count the stream instead. We cannot use group
because it returns a group of stream, or in other words, an array of stream to my understanding again.
So here is how I solve it:
- Create an index on the field I want to groups
- Call distnct on that table with the index.
- Map the stream, passing value into a count function with getAll, using index
An example query
r.table('t').distinct({index: 'index_name'})
.map(function(value) {
return {group: value, total: r.table('t').getAll(value, {index: 'index_name'}).count()}
})
With this, everything is a stream and we can lazily iterator over result set to get the count of each group.