0

I'm building a search database. Most entries are proper nouns (names and street addresses). I set up an ngram token filter to help with fast fuzzy searching. It works well. However, if I search for "John Allen", the results include "John Allen" and "Allen John" with the same score (i.e. relevance ranking). How can I tune the index settings or query syntax to make elastic still return both documents when I search for "John Allen", but assign a higher score to "John Allen" than to "Allen John"?

Here are the index settings...

  {
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "filter": [
            "lowercase"
          ],
          "type": "custom",
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "token_chars": [
            "letter",
            "digit",
            "custom"
          ],
          "custom_token_chars": "'-",
          "min_gram": "3",
          "type": "ngram",
          "max_gram": "4"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "full_name": {
        "type": "text",
        "analyzer": "my_analyzer",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

and here is a sample query...

{
    "query": {
        "query_string": {
            "query": "Allen John",
            "fields": [
                "full_name"
            ]
        }
    }
}

Other notes:

  1. We are not using wildcards because they slow down queries.
  2. Our typical index will have 10 million documents or fewer.
  3. Speed is critical, just as it is in most elasticsearch applications.
  4. From what I've read so far, it's possible the answer or hints to the answer are in elasticsearch's edge n-gram tokenization technique or elasticsearch's completion suggester. Or maybe not.

I have also tried this following query... (after reading ElasticSearch taking word order into account in match query) It did not help with my issue.

{
    "query": {
        "bool": {
            "must": {
                "query_string": {
                    "query": "Bill",
                    "fields": [
                        "full_name"
                    ]
                }
            },
            "should": {
                "span_near": {
                    "clauses": [
                        {
                            "span_term": {
                                "full_name": "Bill Tim"
                            }
                        }
                    ],
                    "slop": 5
                }
            }
        }
    }
}
GNG
  • 1,341
  • 2
  • 23
  • 50

2 Answers2

0

We can add one more field that uses standard analyzer and if the query string matches that field, then we can boost with higher value and if not then get the score matched by ngram analyzer.

"mappings": {
    "properties": {
      "full_name": {
        "type": "text",
        "analyzer": "my_analyzer",
        "fields": {
          "keyword": {
            "type": "keyword"
          },
          "standard" :{
            "type": "text" //this field uses default standard analyzer
          }
        }
      }
    }

The search query should be changed to include both fields with standard field having higher boost value.

{
    "query": {
        "query_string": {
            "query": "Allen John",
            "fields": [
                "full_name", "full_name.standard^2"
            ]
        }
    }
}
Kumar V
  • 1,570
  • 1
  • 12
  • 19
0

One option could be to add another query to do a phrase search. If the phrase matched then it will be scored higher

{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "Allen John",
            "fields": [
              "full_name"
            ]
          }
        },
        {
          "query_string": {
            "query": "\"Allen John\"",
            "fields": [
              "full_name"
            ]
          }
        }
      ]
    }
  }
}
jaspreet chahal
  • 8,817
  • 2
  • 11
  • 29
  • Helpful, but this only works when the query is an exact match. If I search for "Alen John" with one 'L', rather than "Allen John", "Allen John" does not get a higher score than "John Allen" – GNG May 12 '20 at 06:36