How to use regex in secondary indexes on rethinkdb?

Question

I have over 300 000 records in rethinkdb in document "car" in database "db". When I try to do "widlcard search" with filter for property "name" it takes forever to load. This is code:

r.db('db').table('car').filter(r.row('name').match("(?i).\*"+search_string+".\*"))

where search_string is some part of "car" name.

This is working but most of the times it takes forever (and I mean forever).

Now I have created secondary index on document "car" for property "name" but I do not know how to use that "wildcard" search on getAll.

r.db('db').table('car').getAll(r.row('name').match("(?i).\*"+search_string+".\*"), {index:"name"})

this is not working of course.

On this url: https://rethinkdb.com/blog/1.6-release/ I found this:

// Create a secondary index for all users whose first names begin with `S`
r.table('users').indexCreate('starts_with_S', r.row('first_name').match('S.*').ne(null))

// Efficiently get all the users where `first_name` begins with `S`
r.table('users').getAll(true, {index: 'starts_with_S'})

but I do not know how that can help. I think it is just mad to create index for each search on "name".

When full string is matched, of course, getAll on my index "name" is working but that is not really what I am asking for.

I can see a lot of people have same problem.

Any solution maybe?

score 1 · Accepted Answer · answered Feb 25 '16 at 23:06

What you want is more suitable for another database with full text search support like ElasticSearch. Have a looks in this article: https://www.rethinkdb.com/docs/elasticsearch/

RethinkDB doesn't have any capabilities to do full text search on an index, or apply arbitrary on index. It's quite limited to what kind of operation can be done on index such as getAll, between.

score 1 · Answer 2 · edited May 23 '17 at 12:32

You may want to build a secondary index out of 'name' characters split up and incrementally including more characters. You would do something like:

r.db('test').table('names').get('9c2472d9-8d0e-4f8d-8a6a-5c04076e4abd').do(
  function(entry) {
    return r.range(1, entry('name').count().add(1)).map(
      function(index) {
      return entry('entry').slice(0, index);
      })
  }
).coerceTo('array');

Which, assuming your name was 'Volvo', would return the following: [ "V" , "Vo" , "Vol" , "Volv" , "Volvo" ]

Tweak with that query to get the result you want, then just create a secondary multi index over that array of objects (that can be a bit tricky, so you might want to refer to this question for guidance: How to query a multi index in RethinkDB over an array of objects).

Then simply perform a r.db('test').table('names').getAll('query', {index: 'your_new_index_name'}) and that should do the trick, because you will be able to query both 'Volv', 'Vol', 'Volvo', etc.

Note, however, that this only works if your queries begin from the beginning of the word: thus, if you have 'Renault Kadjar' in your list, it will work if the user types in 'Renault', but it will not if they type in 'Kadjar' (you will need to tweak with that query even more to split your name into words and them split them up further).

How to use regex in secondary indexes on rethinkdb?

2 Answers2