I have an Elasticsearch v2.4.1 index in which I store values from a JSON feed. Sometimes I get values separated by spaces in some fields, like:
"titulo" : "E l a ñ o q u e e l m e r c a d o d e j ó d e a s u s t a r"
This happens around 15% of the time and prevents queries such as:
localhost:9200/indice/_search?q=titulo:mercado
To match the document above.
I think the problem could be solved by using some sort of CharFilter, I thought of the N-gram filter but that does the opposite. I know this might be complex since ES should, at some level, infer the language (or maybe I could specify it); deal with ambiguities and so on...
Another examples of the same:
"title" : "El g a l a r d ó n se e n t r e g a r á el p r ó x i m o día 2 4"
"title" : "G a m a a c t u a l i z a d a d e b o m b a s d e calor A q u a t e r m i c"
"title" : "K a s p e r s k y : m á s q u e a n t i v i r u s"