0

Using searchkick and see that a search for "animals" is returning results for "anime" because of their stem "anim". Does anyone have any suggestions on how to improve these results?

I see the in docs you can do something like

exclude_queries = {
  "animals" => ["anime"],
}

Product.search query, exclude: exclude_queries[query]

But it seems like a lot of work to keep a running list for all of the bad ones like this.

Wondering if I need to change the stemmer?

user2031423
  • 347
  • 4
  • 7

2 Answers2

1

Looks like instead of standard analyzer which doesn't stem the tokens somehow you are using the english analyzer which uses the stemmer, causing the stemmed tokens as shown below:

POST http://{{hostname}}:{{port}}/{{index-name}}/_analyze

{
    "text" : "animals",
    "analyzer" : "english"
}

{
    "tokens": [
        {
            "token": "anim",
            "start_offset": 0,
            "end_offset": 5,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

The standard analyzer(Default on text field) generates non-stemmed tokens

{
    "text" : "animals",
    "analyzer" : "standard"
}

{
    "tokens": [
        {
            "token": "animals",
            "start_offset": 0,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

If you use standard analyzer you will not the stemmed form but then running will not produce run stemmed form to token and searching for running will not produce results for run, runs etc. Its a trade-off and according to your business requirements you need to choose and modify the analyzers.

Amit
  • 30,756
  • 6
  • 57
  • 88
  • Thanks for the confirmation. I was hoping there was something else I could do between those two choices. I definitely don't want to lose the power of handling plurals etc, but want my results to be more relevant. I would expect that things matched with `"animals"` would get a higher relevancy ranking, but maybe that is not the case? – user2031423 Jun 23 '20 at 15:45
  • @user2031423 https://www.elastic.co/guide/en/elasticsearch/reference/master/mixing-exact-search-with-stemming.html could be a choice and in fact I have seen a lot of people use them but you need to also aware of the performance issue as now you are storing multiple form of the same data which would increase your index size and again you need to query in all these fields which again is a major concern but if you don't have a large dataset than its not that problematic and again its a trade off b/w functional vs non-functional(performance) requirement. – Amit Jun 24 '20 at 01:43
0

I might try something like this. https://www.elastic.co/guide/en/elasticsearch/reference/master/mixing-exact-search-with-stemming.html

Update

Ankane at searchkick gem was kind enough to add a feature to help with this. As of 4.4.1 you can do this.

class Product < ApplicationRecord
  searchkick stemmer_override: ["anime => anime"]
end

This will prevent "anime" from being stemmed to "anim". So it won't show up in the "animals" search results.

user2031423
  • 347
  • 4
  • 7