ElasticSearch - custom analyzer with filters - filters not applied

Question

I have the following query:

GET /nameofmyindex/_analyze
{
  "text" : "Limousinetesting",
  "explain": true,
  "analyzer": "default"
}

That results in:

{
  "detail" : {
    "custom_analyzer" : true,
    "charfilters" : [ ],
    "tokenizer" : {
      "name" : "standard",
      "tokens" : [
        {
          "token" : "Limousinetesting",
          "start_offset" : 0,
          "end_offset" : 16,
          "type" : "<ALPHANUM>",
          "position" : 0,
          "bytes" : "[4c 69 6d 6f 75 73 69 6e 65 74 65 73 74 69 6e 67]",
          "positionLength" : 1,
          "termFrequency" : 1
        }
      ]
    },
    "tokenfilters" : [ ]
  }
}

And my index configuration looks like this:

{
   "nameofmyindex":{
      "aliases":{

      },
      "mappings":{
         "properties":{
            "author":{
               "type":"integer"
            },
            "body:value":{
               "type":"text",
               "fields":{
                  "keyword":{
                     "type":"keyword",
                     "ignore_above":256
                  }
               }
            },
            "changed":{
               "type":"date",
               "format":"epoch_second"
            },
            "created":{
               "type":"date",
               "format":"epoch_second"
            },
            "id":{
               "type":"keyword"
            },
            "promote":{
               "type":"boolean"
            },
            "search_api_language":{
               "type":"keyword"
            },
            "sticky":{
               "type":"boolean"
            },
            "title":{
               "type":"text",
               "boost":5.0,
               "fields":{
                  "keyword":{
                     "type":"keyword",
                     "ignore_above":256
                  }
               }
            },
            "type":{
               "type":"keyword"
            }
         }
      },
      "settings":{
         "index":{
            "number_of_shards":"1",
            "provided_name":"nameofmyindex",
            "creation_date":"1579792687839",
            "analysis":{
               "filter":{
                  "stop":{
                     "type":"stop",
                     "stopwords":[
                        "i",
                        "me",
                        "my",
                        "myself"
                     ]
                  },
                  "synonym":{
                     "type":"synonym",
                     "lenient":"true",
                     "synonyms":[
                        "P-Card, P Card => P-Card",
                        "limousinetesting => limousine"
                     ]
                  }
               },
               "analyzer":{
                  "default":{
                     "type":"custom",
                     "filters":[
                        "lowercase",
                        "stop",
                        "synonym"
                     ],
                     "tokenizer":"standard"
                  }
               }
            },
            "number_of_replicas":"1",
            "uuid":"QTlVnyWVRLayEfPWTrcgdg",
            "version":{
               "created":"7050199"
            }
         }
      }
   }
}

As you see, the default analyzer with the filters are not effective, the 'Limousinetesting' word doesn't receive its 'limousine' synonym.

How should the analyzer look like that the filters are effective? Even the simplest filter, lowercase doesn't happen in this case.

@AronNovak , can you properly format the output of _settings API ? — Amit, Jan 24 '20 at 02:15

score 0 · Accepted Answer · answered Jan 24 '20 at 09:34

The problem is in your syntax for creating the index settings, I was able to reproduce your issue and fix it. Problem was that you were using the filters in your JSON array to define all the filters, while it should be just filter even though there are many filters you can define in that array as explained in the ES official example.

Please find below the proper format for creating the index:

{
    "mappings": {
        "properties": {
            "author": {
                "type": "integer"
            },
            "body:value": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "changed": {
                "type": "date",
                "format": "epoch_second"
            },
            "created": {
                "type": "date",
                "format": "epoch_second"
            },
            "id": {
                "type": "keyword"
            },
            "promote": {
                "type": "boolean"
            },
            "search_api_language": {
                "type": "keyword"
            },
            "sticky": {
                "type": "boolean"
            },
            "title": {
                "type": "text",
                "boost": 5,
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "type": {
                "type": "keyword"
            }
        }
    },
    "settings": {
        "index": {
            "number_of_shards": "1",
            "analysis": {
                "filter": {
                    "stop": {
                        "type": "stop",
                        "stopwords": [
                            "i",
                            "me",
                            "my",
                            "myself"
                        ]
                    },
                    "synonym": {
                        "type": "synonym",
                        "lenient": "true",
                        "synonyms": [
                            "P-Card, P Card => P-Card",
                            "limousinetesting => limousine"
                        ]
                    }
                },
                "analyzer": {
                    "default": {
                        "type": "custom",
                        "filter": [ --> Notice the change in filters to filter 
                            "lowercase",
                            "stop",
                            "synonym"
                        ],
                        "tokenizer": "standard"
                    }
                }
            },
            "number_of_replicas": "1"
        }
    }
}

Now when I created the index with the above mapping and hit the analyze API with your text, I get its synonym token limousine As shown in the below output.

{
    "tokens": [
        {
            "token": "limousine",
            "start_offset": 0,
            "end_offset": 16,
            "type": "SYNONYM",
            "position": 0
        }
    ]
}

Wonderful, thank you! I tested as well and indeed that weird typo was the source of my issue. — Aron Novak, Jan 24 '20 at 11:53

ElasticSearch - custom analyzer with filters - filters not applied

1 Answers1