0

I'm using this mapping:

  settings index: { number_of_shards: 1, number_of_replicas: 1 }, analysis: {
    analyzer: {
      custom_analyzer: {
        type: "custom",
        tokenizer: "standard",
        filter: ["lowercase", "asciifolding", "custom_unique_token", "custom_tokenizer"]
      }
    },
    filter: {
      custom_word_delimiter: {
        type: "word_delimiter",
        preserve_original: "true"
      },
      custom_unique_token: {
        type: "unique",
        only_on_same_position: "false"
      },      
      custom_tokenizer: {
        type: "nGram",
        min_gram: "3",
        max_gram: "10",
        token_chars: [ "letter", "digit" ]
      }
    }
  } do
    mappings dynamic: 'false' do
      indexes :searchable, analyzer: "custom_analyzer"
      indexes :year
    end
  end

And this query (rails app):

search(query: {match: {searchable: {query:params[:text_search], minimum_should_match:"80%"}}, size:100)

My main problem, if the app is always returning 100 documents (the max wanted). On these 100 documents, only the 10 or 15 first are relevant. Other docs are very too far from the search word.

I tried to: - increase the max_ngram from 3 to 10 - add the minimum should match up to 99%... but I always get 100 results.

I don't really understand, why, for example, if I'm searching "Boucab", il will get 15 great results first, but I will also get "Maucaillou" at the 99th place ? How to reduce the relevance ?

My app is multilingual.

How to not display results with poor scores ? Does I need to use the min_score parameter ? Is-it the only solution ?

alex.bour
  • 2,842
  • 9
  • 40
  • 66
  • Probably because you are using the same analyzer for both search and index. So, basically, you can nGram-ing both the indexed text and the one to search for. – Andrei Stefan Apr 15 '16 at 09:21
  • What is your use case, though? Why have you chosen `nGram` and `match`? – Andrei Stefan Apr 15 '16 at 09:22
  • So you mean it's a bad idea to use match AND Ngrams ? What do you suggest ? My use case is to get the most relevant results from 1M documents and searching only in one string from 3 to 50 chars - in all languages. I want to allow the user to make typos errors when searching that's why I used Ngrams. – alex.bour Apr 15 '16 at 09:29
  • I'm not saying it's a bad idea :-). That's the reason why you get so many results. ES is tokenizing the input which means you will get a lot of ngrams and it will use these to match a long list of ngrams from the index. You can imagine that many documents will match. Usually nGrams are used for indexing and then, at search time, depending on your use case another analyzer is used. To provide suggestion while the user is typing, for example, a keyword analyzer is used and the `match` query is replaced with `term`. Etc etc. – Andrei Stefan Apr 15 '16 at 09:33
  • OK. Thanks. In fact, I don't want autocompletion for the user search in my case. Only a simple search and allow typos. I will try to make a search analyzer without ngrams. – alex.bour Apr 15 '16 at 09:40
  • Then why not using `fuzzy` search? – Andrei Stefan Apr 15 '16 at 09:45
  • 1
    Or a combination, using [multi fields](https://www.elastic.co/guide/en/elasticsearch/reference/current/_multi_fields.html) to have several analyzers analyze your data: one analyzer that keeps the text as is and maybe only lowercases it (for a "perfect match" situation), then a `fuzzy` with a different analyzer to allow for typos and maybe others. Then your query will be a combination of `match` or `term` in a `bool` with `should`s that will use all the subfields. – Andrei Stefan Apr 15 '16 at 09:49
  • In fact I was using `fuzzy` search first (fuzziness set to 2), then, as I'm learning ES, I tried the `ngrams` in my custom analyzer and relevance seems to be better ?! So is-it useful to keep `ngrams` in the index analyzer and to use `match + fuzzy` in my query ? Of If I come back to `fuzzy`, should I drop the ngrams ? – alex.bour Apr 15 '16 at 09:50
  • Depends on your use case. I'd say try, test and see what you get. Also, why are you worried about the 100 responses you get? Can't you just limit the list to 10, for example? – Andrei Stefan Apr 15 '16 at 09:56
  • No I can't limit to 10 results, because sometimes, I will get 100 pertinent results, sometimes 3...etc. Well? I will try more tests with fuzzy. Thanks for your real interest Andrei. Alex – alex.bour Apr 15 '16 at 10:06

0 Answers0