1

We are reaching a phase in our project now where we would like to search in our documents with a regex that match some strings. CouchDB does now allow it with the 2.0 version and /db/_find which is great. Before that you would need to use Elasticsearch.

I would like to know if one solution is better that the other, and what are to consequences in term of disk storage ? I saw in CouchDB documentation a lot of warning about the /bd/_find feature, for instance :

Regular expressions do not work with indexes, so they should not be used to filter large data sets.

Thanks in advance for your enlightenment

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
betelgeuz
  • 309
  • 4
  • 11

1 Answers1

0

Simply speaking, if you use regular expressions with _find, all the documents in your database will be scanned for every query you issue. This is completely different to Elasticsearch, which is optimized for free text queries.

So if you want to use good performance with reqex queries, use Elasticsearch.

The complete picure is a bit more complicated: If you also have a fixed part where an index can be used, you can optimize your query. If you'd elaborate on what exactly your use case is, I might be able to help further.

Bernhard Gschwantner
  • 1,547
  • 11
  • 12
  • 1
    There's actually a module provided by Cloudant to provide full-text search indizes for CouchDB running on Lucene. It's not enabled by default in CouchDB 2.0 builds, but it's not so hard to add either. Cloudant runs it in production. But this makes the question slightly more complicated: How do Dreyfus/Clouseau compare against Elasticsearch? https://github.com/cloudant-labs/dreyfus – dmunch Apr 06 '17 at 14:58
  • 1
    There's also couchdb-lucene, which has been on the market for years. https://github.com/rnewson/couchdb-lucene. There are differences, but in the end the answer to the question is the same: Use a full-text engine for full-text queries, use CouchDb for simpler, structured queries. – Bernhard Gschwantner Apr 09 '17 at 12:54
  • 1
    [There's](https://github.com/rnewson/couchdb-lucene/issues/236) also a nice thread comparing couchdb-lucene and Clouseau on CouchDB 2.0 – dmunch Apr 11 '17 at 18:38