0

I used the following settings to create ES index.

"settings": {
    "analysis" : {
        "analyzer" : {
            "my_analyzer" : {
                "tokenizer" : "standard",
                "filter" : ["standard", "lowercase", "my_stemmer"]
            }
        },
        "filter" : {
            "my_stemmer" : {
                "type" : "stemmer",
                "name" : "english"
            }
        }
    }
}

I noticed that while analysing the stemmer replaces the original string with the stemmed word. Is there a way to index the original string and stemmed token both ?

Rohit Patwa
  • 1,092
  • 1
  • 9
  • 12

1 Answers1

2

Your question is about a "preserve_original" parameter for stemmer token filter:

You will find "preserve_original" e.g. for Word Delimiter Token Filter but not for stemmer token filter.

If you need the original word e.g. for aggregation you can copy the field to another one with a suited analyzer.

If you need the original on the same position of your index you have to wrap the stemmer and build your own analyzer as plugin.

Karsten R.
  • 1,628
  • 12
  • 14
  • This is a little old but thanks for the answer, separate fields for tokenized vs original makes a lot of sense. Had a follow up if you see this. I may be misinterpreting, but you seem to imply that you shouldn't need the original word (except for aggregation or some other use), but why wouldn't you want the original word for search purposes? If the original term was "strawberries" which gets stemmed to "strawberry", then searching "strawberries" should yield a result. I'm fairly new to elastic so I'm concerned I'm missing something in my search implementation. Thanks! – bryan60 May 18 '18 at 15:44
  • Hi bryan60, example where you want to search with on original word in context of stemmed words: Search for a phrase where one word is a name (of a human or city or ..). You don't what to search in names with a stemmer, but you like to use the stemmer for the rest of the phrase. – Karsten R. May 18 '18 at 16:56