0

I am working on ignoring accents and plural/singular when I make a search query. I copied the Spanish analyzer from here and left only the stemmer https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html

you can check my code in Python (I bulk the data from a CSV latter):

settings={
  "settings": {
    "analysis": {
      "filter": {
        "spanish_stemmer": {
          "type":       "stemmer",
          "language":   "light_spanish"
        }
      },
      "analyzer": {
        "rebuilt_spanish": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "spanish_stemmer"
          ]
        }
      }
    }
  }
}
    
es.indices.create(index="activities", body=settings)

However, when I try a GET query from insomnia like geometrico, geométrico, geométricos, geometricos I get 0 results and there is a doc with Title Cuerpos geométricos. It should match since I want to make no difference with accents and plural singular. Any ideas?

The GET query I do:

{
    "query": {
        "function_score": {
            "query": {
                "multi_match": {
                    "query": "geométricos",
                    "fields": [
                        "Descripcion",
                        "Nombre",
                        "Tags"
                    ],
                 "analyzer":"rebuilt_spanish"
                }
            }
        }
    }
}
Laura Galera
  • 89
  • 1
  • 10

1 Answers1

0

You will need to add ASCII folding token filter to your token filters check official documentation here. So your Analyzer should be like this:

Anlayzer:

"analysis": {
      "filter": {
        "spanish_stemmer": {
          "type":       "stemmer",
          "language":   "light_spanish"
        }
      },
      "analyzer": {
        "rebuilt_spanish": {
          "tokenizer":  "standard",
          "filter": [
            "asciifolding", // ASCII folding token filter
            "lowercase",
            "spanish_stemmer"
          ]
        }
      }
    }
  }
Kaveh
  • 1,158
  • 6
  • 16
  • 1
    Not really, spanish_stemmer already handles accents. My error was not adding the analyzer in the mapping but thanks – Laura Galera Jul 01 '21 at 10:56