I am using elasticsearch 6.8 for text searching. And I realised that elasticsearch tokenizer breaks text into words by using delimiters listed here: http://unicode.org/reports/tr29/#Default_Word_Boundaries. I am using match_phase
to search one of the fields in my document and I'd like to remove one delimiter used by tokenizer.
I did some search and found some solutions like, using keyword
rather than text
. This solution will have a big impact on my search function because it doesn't support partial query.
Another solution is to use keyword
query but use wildcard to support partial query. But this may impact performance on the query. And also, I still like using tokenizer for other delimiters.
A third options is to use tokenize_on_chars
to define all characters used to tokenize text. But this requires me to list all other delimiters. So I am looking for something like tokenize_except_chars
.
So is there a easy way for me to take one character out from delimiters tokenizer is using in elasticsearch6.8?