0

I'm using FOSElasticaBundle to index ES documents with this config:

index:
analysis:
    analyzer:
        custom_analyzer:
            type:           custom
            tokenizer:      nGram
            filter:         [stopwords, asciifolding ,lowercase, snowball, elision, worddelimiter]
        custom_search_analyzer:
            type:           custom
            tokenizer:      standard
            filter:         [stopwords, asciifolding ,lowercase, snowball, elision, worddelimiter]
    tokenizer:
        nGram:
            type:           nGram
            min_gram:       2
            max_gram:       20
    filter:
        snowball:
            type:           snowball
            language:       French
        elision:
            type:           elision
            articles:       [l, m, t, qu, n, s, j, d]
        stopwords:
            type:           stop
            stopwords:      [_french_]
            ignore_case:    true
        worddelimiter:
            type:           word_delimiter
    types:
        document:
            indexable_callback:         'isIndexable'
            mappings:
                title:
                    boost:              3
                    index_analyzer:     custom_analyzer
                    search_analyzer:    custom_search_analyzer
                summary:
                    boost:              2
                    index_analyzer:     custom_analyzer
                    search_analyzer:    custom_search_analyzer
                description:
                    boost:              1
                    index_analyzer:     custom_analyzer
                    search_analyzer:    custom_search_analyzer

I'm trying to use the highlight functionnality of ES, here is an request example:

{
  "query":
  {
    "bool":
    {
      "must":
      [
        {
          "query_string": {
            "query": "blonde",
            "default_field": "_all"
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "*": {  }
    }
  }
}

Gives the result:

"highlight": {

    "title": [
        "Une jeune personne b<em>personne blonde se</em><em>ersonne blonde se te</em><em>blonde se tenait e</em>n partie double, elle avait choisi."
    ]

}

The original content is Une jeune personne blonde se tenait en partie double, elle avait choisi.

I've done some tests with different analyser configuration + reindexation of the documents, but I never got a good highlight of all the snippet: sometimes, one is highlighted, not the others, sometimes, none, etc.

What's the matter between the analysers and the highlights process? What's wrong with my config?

Lionel
  • 387
  • 3
  • 18

1 Answers1

0

Notice you can adjust your highlighting params, check with the config above:

"highlight": {
        "number_of_fragments": 5,
        "type": "plain",
        "fields": {
            "*": {
                "fragment_size": 100
            }
        }
    }

There is another link here which could help you about strange results : Curious behaviour of fragment_size in elasticsearch highlighting

Community
  • 1
  • 1
Sylvain Martin
  • 2,365
  • 3
  • 14
  • 29