2

I am fairly new to Elasticsearch, but managed to produce results, almost to what I expect them to be, except one small problem. I am showing only the code that focuses on the issue. Using edgeNGram as a filter:

filter: {
  'type':'edgeNGram',
  'max_gram':10,
  'min_gram':3,
  'side': 'front',
  'minimum_should_match':'100%'
}

So, the results come out as expected, however I don't get any results for words smaller than 3 characters long. 3 characters longs give fairly good results, but 2 characters breaks the results, giving a lot of irrelevant results.

Essentially, what I would like is to use edgeNgram for 3 characters long, but also search for 2-character-long terms.

Looking forward to your suggestions!

scooterlord
  • 15,124
  • 11
  • 49
  • 68
  • You'd need a second edgeNGram filter that is part of an analyzer set on a sub-field of the field you're searching on... That way you can combine your current search on the main field for 3+ characters and another constraint on the sub-field for 2-chars tokens – Val Oct 23 '20 at 08:11
  • @Val thanks for your reply. Can you elaborate a bit more with an example? However, take under consideration that I need this functionality across all fields. – scooterlord Oct 23 '20 at 08:13
  • It'd help if you provide a reproducible example first... – Val Oct 23 '20 at 08:19
  • Actually, I just noticed there's a `preserve_original` option for the `edge_ngram` filter tha reads: (Optional, boolean) Emits original token when set to true. Defaults to false. Re-indexing and will get back with results. It's surprising that you find solutions you've been search for days exactly after you post a question at SO! – scooterlord Oct 23 '20 at 08:19
  • @Val I'm not really sure how to output an example from Elasticpress that I am using - I will look for it shortly and get back to you. Thank you in advance for the support. – scooterlord Oct 23 '20 at 08:21
  • Sure, let us know once you can provide more insights – Val Oct 23 '20 at 08:23
  • Looks like I am getting the expected results with the `preserve_original` option! – scooterlord Oct 23 '20 at 08:45
  • 1
    Awesome, glad you figured it out! – Val Oct 23 '20 at 09:20

1 Answers1

1

Well, I've been googling this for so many days, and I just now found the solution to my own problem. The edgeNGram filter has a preserve_original option. Documentation reads:

(Optional, boolean) Emits original token when set to true. Defaults to false.

source: https://www.elastic.co/guide/en/elasticsearch//reference/current/analysis-edgengram-tokenfilter.html

This seems to be working for me and am now getting the expected results! Hope it helps someone that ends up here, it wasn't an easy find.

scooterlord
  • 15,124
  • 11
  • 49
  • 68