I am trying to make an Elasticsearch filter, analyzer and tokenizer to be able to normalize searches like:
"henry&william book"
->"henrywilliam book"
"henry & william book"
->"henrywilliam book"
"henry and william book"
->"henrywilliam book"
"henry william book"
->"henry william book"
In other words, I would like to normalize my "and" and "&" queries, but also concatenate the words between them.
I'm thinking of making a tokenizer that breaks "henry & william book"
into tokens ["henry & william", "book"]
, and then make a character filter that makes the following replacements:
" & "
->""
" and "
->""
"&"
->""
However, this feels a bit hackish. Is there a better way to do it?
The reason I can't just do this entirely in the analyzer/filter phase, is that it runs too late. In my attempts, Elasticsearch has already broken "henry & william"
into just ["henry", "william"]
before my analyzer/filter runs.