I have some summation data that was very easy to generate using some relatively simple map/reduce views. But we want to sort the data based on the group-reduced view values (not the keys). It was suggested that we could use couchdb-lucene to do this. But how? It's not clear to me how to use a full text index to quickly rank this sort of data.
What we already have
An oversimplified example view looks something like the following:
by_sender: {
map: "function(doc) { emit(doc.sender, 1); }",
reduce: "function(keys, values, rereduce) { return sum(values); }"
}
Which returns results somewhat like the following (when run with group=true
):
{"rows":[
{"key":"a@example.com","value":2},
{"key":"aaa@example.com","value":1},
{"key":"aaap@example.com","value":34},
{"key":"aabb@example.com","value":1},
... thousands or tens of thousands of rows ...
]}
What we want
Those are sorted by the key, but I need to sort it data according the values, like so:
{"rows":[
{"key":"xyzzy@example.com","value":847},
{"key":"adam@example.com","value":345},
{"key":"karl@example.com","value":99},
{"key":"aaap@example.com","value":34},
... thousands or tens of thousands of rows ...
]}
And I need it sorted as quickly as is reasonably possible (e.g. if it only takes <100ms to update the indexes, it shouldn't take 1 minute before the new data is reflected in queries).
More context: what we already tried
The best answer on Sorting CouchDB Views By Value gives four viable options, which we've tried in increasing order of difficulty:
- First we sorted the results client side, but that was way too slow.
- Next we created a list function which sorts the data. A little faster, but still too slow.
- Chained Map-Reduce Views should handle this problem easily.
- Someone pointed out Cloudant's Chained Map-Reduce Views. They are not in BigCouch but are part of Cloudant's services, which are unfortunately not in our budget at this time.
- I started an application layer implementation using the _bulk_docs API. It is tricky if you want to keep updates as snappy as possible while avoiding race conditions, etc. I can continue with this approach, but it is not relaxing. :(
- The answer suggested using couchdb-lucene. But I'm not nearly familiar enough with full-text search to understand how to get it to do anything more sophisticated than index the document and return a search result. I don't even know where to start.