2

I'm having trouble setting up a search_as_you_type field with highlighting following the guide here https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-as-you-type.html

I'll leave a series of commands to reproduce what I'm seeing. Hopefully somebody can weigh in on what I'm missing :)

  1. create mapping
PUT /test_index
{
  "mappings": {
    "properties": {
      "plain_text": {
        "type": "search_as_you_type",
        "index_options": "offsets",
        "term_vector": "with_positions_offsets"
      }
    }
  }
}
  1. insert document
POST /test_index/_doc
{
  "plain_text": "This is some random text"
}
  1. search for document
GET /snippets_test/_search
{
  "query": {
    "multi_match": {
      "query": "rand",
      "type": "bool_prefix",
      "fields": [
        "plain_text",
        "plain_text._2gram",
        "plain_text._3gram",
        "plain_text._index_prefix"
      ]
    }
  },
  "highlight" : {
    "fields" : [
      {
        "plain_text": {
          "number_of_fragments": 1,
          "no_match_size": 100
        } 
      }
    ]
  }
}
  1. response
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "rLZkjm8BDC17cLikXRbY",
        "_score" : 1.0,
        "_source" : {
          "plain_text" : "This is some random text"
        },
        "highlight" : {
          "plain_text" : [
            "This is some random text"
          ]
        }
      }
    ]
  }
}

The response I get back does not have the highlighting I expect Idealy the highlight is: This is some <em>ran</em>dom text

James
  • 2,742
  • 1
  • 20
  • 43

1 Answers1

5

In order to achieve highlighting of n-grams (chars) you'll need:

  • a custom ngram tokenizer. By default the maximum difference between min_gram and max_gram is 1, so in my example highlighting will work only for the search terms with length 3 or 4. You can change this and creating more n-grams by setting a higher value for index.max_ngram_diff .
  • a custom analyzer based on the custom tokenizer
  • in mapping add "plain_text.highlight" field

Here's the configuration:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "partial_words" : {
          "type": "custom",
          "tokenizer": "ngrams",
          "filter": ["lowercase"]
        }
      },
      "tokenizer": {
        "ngrams": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 4
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "plain_text": {
        "type": "text",
        "fields": {
          "shingles": { 
            "type": "search_as_you_type"
          },
          "ngrams": {
            "type": "text",
            "analyzer": "partial_words",
            "search_analyzer": "standard",
            "term_vector": "with_positions_offsets"
          }
        }
      }
    }
  }
}

the query:

{
  "query": {
    "multi_match": {
      "query": "rand",
      "type": "bool_prefix",
      "fields": [
        "plain_text.shingles",
        "plain_text.shingles._2gram",
        "plain_text.shingles._3gram",
        "plain_text.shingles._index_prefix",
        "plain_text.ngrams"
      ]
    }
  },
  "highlight" : {
    "fields" : [
      {
        "plain_text.ngrams": { } 
      }
    ]
  }
}

and the result:

    "hits": [
        {
            "_index": "test_index",
            "_type": "_doc",
            "_id": "FkHLVHABd_SGa-E-2FKI",
            "_score": 2,
            "_source": {
                "plain_text": "This is some random text"
            },
            "highlight": {
                "plain_text.ngrams": [
                    "This is some <em>rand</em>om text"
                ]
            }
        }
    ]

Note: in some cases, this config might be expensive for memory usage and storage.

Catalin M.
  • 1,026
  • 1
  • 8
  • 7
  • I had moved from 'search_as_you_type' to an ngram analyzer, but this answer still applies to me! I still needed to move my ngram from the filter to the tokenizer. Thanks :) – James Feb 19 '20 at 22:34
  • With the query "ran" this solution produces the highlight: "This is some random text" which ends up highlighting "rand" instead of highlighting just "ran". – piggs_boson Jul 13 '20 at 00:45
  • @catalin-m Hey one query here. For eg: I have a query like 'good developer' and i have some indexes like 'developer and tester', 'not only developer', 'good developer', 'good tester'. Then when i query with search as you type, i am getting all the indexs which includes developer. Do you have any idea why? – Vishnu Oct 02 '20 at 05:39
  • @James Could you please share how you moved from 'search_as_you_type' to an ngram analyzer? – Vishnu Oct 02 '20 at 06:20
  • @Vishnu happy to take a look at your problem, but the comment section here isn't the best place. It's very hard for me to reproduce your issue without more information. My recommendation is for you to write up something the way I did with: mapping/index, document creation and query commands. Then, describe what you see, and why that is not what you expect to see. Basically, try to provide as much information as possible to make the Answer's job easier. – James Oct 04 '20 at 04:55