0

I'm a Couchdb newbbie.

I've already created a view for doing an "SQL like" on my Products (keys are all Code, and Description words).

function (doc) {
    if (doc.type === 'product') {
        var words = { };
        var text = doc.code + ' ' + doc.description;
        text.replace(/\w+/g, function(word) {
            words[word.toLowerCase()] = true;
        });
        for (var w in words) {
            emit(w, doc);
        }
    }
}

My Products also are of a given category. I want to permit the user to get Productso from a given category, and THEN apply that LIKE on that subset

Doing a second view on the products category would solve the filter by category

Question

Which is the couchdb way to do this?

The options i see are:

  1. Build a view named like_by_category, which key is compound [category, word].

  2. Run first view filter by category, run later the like by word, and join manually both resultsets to see which results are on both

Any rope would help me to get out of this hole!

The option 1 is just theory, dont know if it will let me do pagination with ease.

The option 2 also just theory, but i'm no sure about the performance on doing those 2 view runs, specially on mobile devices running http://www.pouchdb.com

Javier Castro
  • 321
  • 3
  • 12

1 Answers1

0
  1. On CouchDB, if you want to do full-text search, then your best option is the CouchDB Lucene plugin. Splitting on whitespace works in a pinch, but for morphological inflections like work/works/worked/working, and especially for non-English languages, you'll want a real tokenizer.
  2. On PouchDB, there is the PouchDB Search plugin, which unfortunately is currently not very performant, because it reads the entire database into memory. If you're targeting only mobile devices (which I assume means "iOS and Android"), you should look into the FTS capabilities of SQLite using WebSQL. Here's some discussion of FTS and a live example.
nlawson
  • 11,510
  • 4
  • 40
  • 50
  • So u are saying i have no plain PouchDB solution for this? – Javier Castro May 30 '14 at 17:56
  • Sorry for not providing a helpful answer. Your option #1 is probably the most efficient, although I hope you're also saving it in a design doc first for maximum performance. Also don't do `emit(w, doc)`; just do `emit(w)` and then use `{include_docs:true}` in your query later. – nlawson May 31 '14 at 23:21
  • I have a doubt: if i create the db, then i put a view on it, then i query it to launch its build, then i sync to a remote. finally, i query the view again, the index would be updated with synced data, and would return results fast? – Javier Castro Jun 02 '14 at 15:43
  • Actually, if changes are synced from the remote to the local, you'd have to query again in order to update it. This is unfortunately just how views work in CouchDB. However, if the changes are small, you might still find that it's plenty fast! :) – nlawson Jun 02 '14 at 16:21
  • Yes, but the first time, changes are ALL documents, which can be almost 100k docs... so the indexing takes lots of time. I'though that indexing was done during replication too, when new docs where added/updated... So, i'm totally wrong and the only way is to wait the first sync right? – Javier Castro Jun 02 '14 at 19:57
  • Yeah, indexes are not updated automatically. However, they are updated incrementally, so you don't need to wait for the first sync. In your use case, it sounds like you should just call `query()` without passing in the `stale` option. You'll take a slight performance hit, but your results will be up-to-date. – nlawson Jun 02 '14 at 22:21
  • If i have 5 views, and 300k docs on the pouchdb, and then i do a query() on each of them, pouchdb will execute each view js function on each of the 300k docs... 5 times... right? – Javier Castro Jun 03 '14 at 14:11
  • Yep, the first time anyway. You can use native Erlang views or split up your db into multiple dbs or use fewer views if you want this to run faster. In practice I personally have used the strategy of sharding my database into many smaller databases, which is basically what CouchDB itself will achieve when they finish the BigCouch merge. – nlawson Jun 03 '14 at 18:19
  • The part [Use and abuse your doc IDs](http://pouchdb.com/2014/05/01/secondary-indexes-have-landed-in-pouchdb.html) on your blog post was very very useful for my use case :) – Javier Castro Jun 04 '14 at 15:30