18

I would like to use CouchDB to store some data for me and then use RESTful api calls to get the data that I need. My database is called "test" and my documents all have a similar structure and look something like this (where hello_world is the document ID):

"hello_world" : {"id":123, "tags":["hello", "world"], "text":"Hello World"}
"foo_bar" :{"id":124, "tags":["foo", "bar"], "text":"Foo Bar"} 

What I'd like to be able to do is have my users send a query such as: "Give me all the documents that contain the words 'hello world', for example. I've been playing around with views but it looks like they will only allow me to move one or more of those values into the "key" portion of the map function. That gives me the ability to do something like this:

http://localhost:5984/test/_design/search/_view/search_view?key="hello"

But this doesn't allow me to let my users specify their query string. For example, what if they searched for "hello world". I'd have to do two queries: one for "hello" and one for "world" then I'd have to write a bunch of javascript to combine the results, remove duplicates, etc (YUCK!). What I really want is to be able to do something like this:

http://localhost:5984/test/_design/search/_view/search_view?term="hello world"

Then use the parameter "hello world" in the views map/reduce functions to find all the documents that contain both "hello" and "world" in the tags array. Is this sort of thing even possible with CouchDB? Is there another way to accomplish this inside a view that I'm not thinking of?

Mike Farmer
  • 2,992
  • 4
  • 28
  • 32

3 Answers3

20

CouchDB Views do not support facetted search or fulltext search or result intersection. The couchdb-lucene plugin lets you do all these things.

http://github.com/rnewson/couchdb-lucene/tree/master

Jan Lehnardt
  • 393
  • 2
  • 6
2

Technically this is possible if you emit for each document each set of the powerset of the tags of the document as the key. The key set element must be ordered and your query whould have to query the tags ordered, too.

function map(doc) {
  function powerset(array) { ... }

  powerset_of_tags = powerset(doc.tags)
  for(i in powerset_of_tags) {
    emit(powerset_of_tags[i], doc);
  }
}

for the doc {"hello_world" : {"id":123, "tags":["hello", "world"], "text":"Hello World"} this would emit:

{ key: [], doc: ... }
{ key: ['hello'], doc: ... }
{ key: ['world'], doc: ... }
{ key: ['hello', 'world'], doc: ... }

Although is this possible I would consider this a rather arkward solution. I don't want to imagine the disk usage of the view for a larger number of tags. I expect the number of emitted keys to grow like 2^n.

WispyCloud
  • 4,140
  • 1
  • 28
  • 31
ordnungswidrig
  • 3,140
  • 1
  • 18
  • 29
  • 2
    this is not recommended. Performance will suffer greatly and as you mentioned the storage for the indexes will grow out of control. couchdb-lucene mentioned above is the correct way to do what he's wanting. – Jeremy Wall Aug 10 '09 at 04:17
0

under the hood, couchdb stores data by b-tree thus you should use views to pre-process, the limitation in this case that is you can not search regex. The alternative, you can search by prefixes or suffixes from the key in views.

Note: don't use emit(key, doc), it will clone document, you should use emit(key, null) or emit(key) and add "include_docs = true" when query.

You can use yours tags as key to query.

//view function

function (doc) {
  if (doc.type === "hello") {
    emit(doc);
  }
}

//mango query

db
.query(your_view_name,
      { startkey: startkey, endkey: endkey, include_docs: true });

Note:

endkey = startkey + "\uffff";
startkey = "h", "he", "hell"...

Plus: don't never use mango query to query regex if you don't want performance go to the hell, sences. I fixed performance issue from 2 minutes to 2 seconds by view function.