3

I have used in my model to include spell check such that if the user inputs data like "Rentaal" then it should fetch the correct data as "Rental"

document.rb code

require 'elasticsearch/model'

class Document < ApplicationRecord
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks
  belongs_to :user

  Document.import force: true


  def self.search(query)
  __elasticsearch__.search({
      query: {
        multi_match: {
          query: query,
          fields: ['name^10', 'service']
      }
    }
    })
  end


  settings index: { 
    "number_of_shards": 1, 
    analysis: {
      analyzer: {
        edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter: 
          ["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
            }
        },
        filter: {
                  edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram: 
                  "20" } 
      }
    } do
    mapping do
      indexes :name, type: "string", analyzer: "edge_ngram_analyzer"
      indexes :service, type: "string", analyzer: "edge_ngram_analyzer"
    end 
  end
end

search controller code:

def search
  if params[:query].nil?
    @documents = []
  else
    @documents = Document.search params[:query]
  end
end

However, if I enter Rentaal or any misspelled word, it does not display anything. In my console

     @documents.results.to_a 

gives an empty array.

What am I doing wrong here? Let me know if more data is required.

Mahesh Mesta
  • 169
  • 7

1 Answers1

2

Try to add fuzziness in your multi_match query:

{
      "query": {
        "multi_match": {
          "query": "Rentaal",
          "fields": ["name^10", "service"],
          "fuzziness": "AUTO"
      }
    }
}

Explanation

Kstem filter is used for reducing words to their root forms and it does not work as you expected here - it would handle corectly phrases like Renta or Rent, but not the misspelling you provided.

You can check how stemming works with following query:

curl -X POST \
  'http://localhost:9200/my_index/_analyze?pretty=true' \
  -d '{
  "analyzer" : "edge_ngram_analyzer",
  "text" : ["rentaal"]
}'

As a result I see:

{
    "tokens": [
        {
            "token": "ren"
        },
        {
            "token": "rent"
        },
        {
            "token": "renta"
        },
        {
            "token": "rentaa"
        },
        {
            "token": "rentaal"
        }
    ]
}

So typical misspelling will be handled much better with applying fuzziness.

Joanna Mamczynska
  • 2,148
  • 16
  • 14
  • 1
    Sorry so late but I didn't have much time earlier - I updated my answer with explanation why `kstem` is not enough in your case, may be useful in the future. – Joanna Mamczynska Jul 31 '17 at 11:12