1

I'm trying to use ElasticSearch for partial matches on multiple fields using NGram, but I'm matching 0 results after I build the index. This is not coming very naturally to me, and I can't seem to even get NGram working for even one field. This is a passion project for me, and I really want the new search working for partial word matches. I tried using fuzziness but it started scoring incorrect matches too high.

Index Create:

var nGramFilters = new List<string> { "lowercase", "asciifolding", "nGram_filter" };

Client.Indices.Create(CurrentIndexName, c => c
    .Settings(st => st
            .Analysis(an => an // https://stackoverflow.com/questions/38065966/token-chars-mapping-to-ngram-filter-elasticsearch-nest
                .Analyzers(anz => anz
                    .Custom("ngram_analyzer", cc => cc
                        .Tokenizer("ngram_tokenizer")
                            .Filters(nGramFilters))
                        )
                        .Tokenizers(tz => tz
                                .NGram("ngram_tokenizer", td => td
                                    .MinGram(2)
                                        .MaxGram(20)
                                        .TokenChars(
                                            TokenChar.Letter,
                                            TokenChar.Digit,
                                            TokenChar.Punctuation,
                                            TokenChar.Symbol
                                        )
                                    )
                                )
                            )
                        )
                        .Map<Package>(map => map
                            .AutoMap()
                            .Properties(p => p
                            .Text(t => t
                                .Name(n => n.Title)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.Summary)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PestControlledBy)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PesticideControlsThesePests)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PesticideInstructions)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PesticideActiveIngredients)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PesticidesContainingThisActiveIngredient)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PesticideSafeOn)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PesticideNotSafeOn)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                        )
                    )
                );

Query:

var result = _client.Search<Package>(s => s
.From((form.Page - 1) * form.PageSize)
.Size(form.PageSize)
.Query(query => query
    .MultiMatch(m => m
        .Fields(f => f
            .Field(p => p.Title.Suffix("ngram"), 1.5)
            .Field(p => p.Summary.Suffix("ngram"), 1.1)
            .Field(p => p.PestControlledBy.Suffix("ngram"), 1.0)
            .Field(p => p.PesticideControlsThesePests.Suffix("ngram"), 1.0)
            .Field(p => p.PesticideInstructions.Suffix("ngram"), 1.0)
            .Field(p => p.PesticideActiveIngredients.Suffix("ngram"), 1.0)
            .Field(p => p.PesticidesContainingThisActiveIngredient.Suffix("ngram"), 1.0)
            .Field(p => p.PesticideSafeOn.Suffix("ngram"), 1.0)
            .Field(p => p.PesticideNotSafeOn.Suffix("ngram"), 1.0)
        )
        .Operator(Operator.Or) // https://stackoverflow.com/questions/46139028/elasticsearch-how-to-do-a-partial-match-from-your-query
        .Query(form.Query)
    )
)
.Highlight(h => h
    .PreTags("<strong>")
    .PostTags("</strong>")
    .Encoder(HighlighterEncoder.Html) //https://github.com/elastic/elasticsearch-net/issues/3091
    .Fields(fs => fs
        .Field(f => f.Summary.Suffix("ngram")),
        fs => fs
        .Field(p => p.PestControlledBy.Suffix("ngram")),
        fs => fs
        .Field(p => p.PesticideControlsThesePests.Suffix("ngram")),
        fs => fs
        .Field(p => p.PesticideInstructions.Suffix("ngram")),
        fs => fs
        .Field(p => p.PesticideActiveIngredients.Suffix("ngram")),
        fs => fs
        .Field(p => p.PesticidesContainingThisActiveIngredient.Suffix("ngram")),
        fs => fs
        .Field(p => p.PesticideSafeOn.Suffix("ngram")),
        fs => fs
        .Field(p => p.PesticideNotSafeOn.Suffix("ngram"))
        .NumberOfFragments(10)
        .FragmentSize(250)
        )
    )
);

Am I even in the right ballpark? I tried using the default analyzer, but I don't match "cat dandelion" for "cat's ear dandelion" and things like that. With the default analyzer... the whole word has to match, but I want partial matches working to get things like "petal" and "petals". Any step in the right direction is appreciated. I'm completely new to ElasticSearch and NEST and have only been working with it for a week or so now.

justiceorjustus
  • 2,017
  • 1
  • 19
  • 42

1 Answers1

3

client.Indices.Create call is invalid, there are two reasons for that:

  1. Difference between MinGram and MaxGram can't be bigger than 1, thus getting this error
Elasticsearch.Net.ElasticsearchClientException: Request failed to execute. Call: Status code 400 from: PUT /my_index1?pretty=true&error_trace=true. ServerError: Type: illegal_argument_exception Reason: "The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [18]. This limit can be set by changing the [index.max_ngram_diff] index level setting."

You can read more about this error here.

  1. There is no such filter like nGram_filter, you will need to change this one to ngram

I discovered these problems by checking index mapping in elasticsearch (localhost:9200/YOUR_INDEX_NAME/_mapping) where I found that mapping wasn't applied. The second step was to see what DebugInformation has to tell me from index creation response

var createIndexResponse = await client.Indices.CreateAsync("my_index1", ..);
createIndexResponse.DebugInformation

Hope that helps.

Rob
  • 9,664
  • 3
  • 41
  • 43
  • This has made my index searchable and green status, so thank you for that. Is this not the right solution for what I'm trying to do? I'm getting some unpredictable results. I'm doing ngram 3-4, and isolated it to only query the title with ngram. I'm getting "Bracted Plantain (Plantago aristata)" scoring 0.3375051 and "Catnip (Nepeta cataria)" scoring 0.33674708 for query "cat". – justiceorjustus Feb 11 '20 at 14:32
  • 1
    Could you run your query with explain mode? That should but some more light on why it's happening. – Rob Feb 11 '20 at 15:13
  • Can you tell me anything about this? https://imgur.com/a/gHa8BJP The blue Title is what's being queried. Here's an example where "canada toadflax" is scoring above "Cat's ear dandelion" for query "cat". – justiceorjustus Feb 11 '20 at 15:47
  • By checking `_analyze` endpoint for both of the queries I can see `nGramFilters` filters are making trouble here, especially [`ngram`](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html) one – Rob Feb 12 '20 at 06:58
  • Any suggestions on what I can change? Should I make a new question? – justiceorjustus Feb 12 '20 at 19:19
  • Removing ngram filter from your analyzer should do the trick in your case I think. – Rob Feb 12 '20 at 19:22
  • Thank you! After removing that, I think it's workable with boosts. Just one more question: If I don't put `.Suffix("ngram")` on my fields, does it use a default analyzer? – justiceorjustus Feb 12 '20 at 20:11
  • No, it won't. Because you are using multi fields, it will use `keyword` field which is not analyzed by default :) – Rob Feb 12 '20 at 20:18