During the last few days I've been playing around elastic-search indexing and searching and I've to build different queries that I intended to. My problem right now is being able to build a query that is able to match text with special characters even if I don't type them in the "search bar". I'll give an example to easily explain what I mean.
Imagine you have a document indexed that contains a field called page content
. Inside this field, you can have a part of the text such as
"O carro do João é preto." (means João's car is black in portuguese)
What I want to be able to do is type something like:
O carro do joao e preto
and still be able to get the proper match.
What I've tried so far:
I've been using the match phrase query provided in the documentation of elasticsearch (here) such as the example below:
GET _search { "query": { "match_phrase": { "page content": { "query": "o carro do joao e preto" } } } }
The result of this query gives me 0 hits. Which is perfectly acceptable given that the provided content of the query is different from what has been stored in that document.
I've tried setting the ASCII Folding Token Filter (here) but I'm not sure of how to use it. So what I've basically done is creating a new index with this query:
PUT /newindex ' { "page content": "O carro do João é preto", "settings" : { "analysis" : { "analyzer" : { "default" : { "tokenizer" : "standard", "filter" : ["standard", "my_ascii_folding"] } }, "filter" : { "my_ascii_folding" : { "type" : "asciifolding", "preserve_original" : true } } } } }'
Then if I try to query, using the match_phrase query provided above, like this:
O carro do joao e preto
it should show me the correct result as I wanted it to. But the thing is it isn't working for me. Am I forgetting something? I've been around this for the last two days without success and I feel like it's something that I'm missing.
So question: What do I have to do to get the desired matching?