Analizer to ignore accents and plural singular in Elasticsearch

Question

I am working on ignoring accents and plural/singular when I make a search query. I copied the Spanish analyzer from here and left only the stemmer https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html

you can check my code in Python (I bulk the data from a CSV latter):

settings={
  "settings": {
    "analysis": {
      "filter": {
        "spanish_stemmer": {
          "type":       "stemmer",
          "language":   "light_spanish"
        }
      },
      "analyzer": {
        "rebuilt_spanish": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "spanish_stemmer"
          ]
        }
      }
    }
  }
}
    
es.indices.create(index="activities", body=settings)

However, when I try a GET query from insomnia like geometrico, geométrico, geométricos, geometricos I get 0 results and there is a doc with Title Cuerpos geométricos. It should match since I want to make no difference with accents and plural singular. Any ideas?

The GET query I do:

{
    "query": {
        "function_score": {
            "query": {
                "multi_match": {
                    "query": "geométricos",
                    "fields": [
                        "Descripcion",
                        "Nombre",
                        "Tags"
                    ],
                 "analyzer":"rebuilt_spanish"
                }
            }
        }
    }
}

Okay I found my error. In case is useful for somebody, I had to add "analyzer":"rebuilt_spanish" in descripcion, nombre, tags in the mapping — Laura Galera, Jul 01 '21 at 10:54
You are using `rebuilt_spanish` analyzer in search query. What is the definition of this analyzer? — Nishant, Jul 01 '21 at 11:50
What is the mapping for the following fields: `Descripcion`, `Nombre`, `Tags`? — Nishant, Jul 02 '21 at 02:32

score 0 · Answer 1 · answered Jul 01 '21 at 10:54

You will need to add ASCII folding token filter to your token filters check official documentation here. So your Analyzer should be like this:

Anlayzer:

"analysis": {
      "filter": {
        "spanish_stemmer": {
          "type":       "stemmer",
          "language":   "light_spanish"
        }
      },
      "analyzer": {
        "rebuilt_spanish": {
          "tokenizer":  "standard",
          "filter": [
            "asciifolding", // ASCII folding token filter
            "lowercase",
            "spanish_stemmer"
          ]
        }
      }
    }
  }

Not really, spanish_stemmer already handles accents. My error was not adding the analyzer in the mapping but thanks — Laura Galera, Jul 01 '21 at 10:56

Analizer to ignore accents and plural singular in Elasticsearch

1 Answers1