2

I have the following query:

GET /nameofmyindex/_analyze
{
  "text" : "Limousinetesting",
  "explain": true,
  "analyzer": "default"
}

That results in:

{
  "detail" : {
    "custom_analyzer" : true,
    "charfilters" : [ ],
    "tokenizer" : {
      "name" : "standard",
      "tokens" : [
        {
          "token" : "Limousinetesting",
          "start_offset" : 0,
          "end_offset" : 16,
          "type" : "<ALPHANUM>",
          "position" : 0,
          "bytes" : "[4c 69 6d 6f 75 73 69 6e 65 74 65 73 74 69 6e 67]",
          "positionLength" : 1,
          "termFrequency" : 1
        }
      ]
    },
    "tokenfilters" : [ ]
  }
}

And my index configuration looks like this:

{
   "nameofmyindex":{
      "aliases":{

      },
      "mappings":{
         "properties":{
            "author":{
               "type":"integer"
            },
            "body:value":{
               "type":"text",
               "fields":{
                  "keyword":{
                     "type":"keyword",
                     "ignore_above":256
                  }
               }
            },
            "changed":{
               "type":"date",
               "format":"epoch_second"
            },
            "created":{
               "type":"date",
               "format":"epoch_second"
            },
            "id":{
               "type":"keyword"
            },
            "promote":{
               "type":"boolean"
            },
            "search_api_language":{
               "type":"keyword"
            },
            "sticky":{
               "type":"boolean"
            },
            "title":{
               "type":"text",
               "boost":5.0,
               "fields":{
                  "keyword":{
                     "type":"keyword",
                     "ignore_above":256
                  }
               }
            },
            "type":{
               "type":"keyword"
            }
         }
      },
      "settings":{
         "index":{
            "number_of_shards":"1",
            "provided_name":"nameofmyindex",
            "creation_date":"1579792687839",
            "analysis":{
               "filter":{
                  "stop":{
                     "type":"stop",
                     "stopwords":[
                        "i",
                        "me",
                        "my",
                        "myself"
                     ]
                  },
                  "synonym":{
                     "type":"synonym",
                     "lenient":"true",
                     "synonyms":[
                        "P-Card, P Card => P-Card",
                        "limousinetesting => limousine"
                     ]
                  }
               },
               "analyzer":{
                  "default":{
                     "type":"custom",
                     "filters":[
                        "lowercase",
                        "stop",
                        "synonym"
                     ],
                     "tokenizer":"standard"
                  }
               }
            },
            "number_of_replicas":"1",
            "uuid":"QTlVnyWVRLayEfPWTrcgdg",
            "version":{
               "created":"7050199"
            }
         }
      }
   }
}

As you see, the default analyzer with the filters are not effective, the 'Limousinetesting' word doesn't receive its 'limousine' synonym.

How should the analyzer look like that the filters are effective? Even the simplest filter, lowercase doesn't happen in this case.

Amit
  • 30,756
  • 6
  • 57
  • 88
Aron Novak
  • 25
  • 4

1 Answers1

0

The problem is in your syntax for creating the index settings, I was able to reproduce your issue and fix it. Problem was that you were using the filters in your JSON array to define all the filters, while it should be just filter even though there are many filters you can define in that array as explained in the ES official example.

Please find below the proper format for creating the index:

{
    "mappings": {
        "properties": {
            "author": {
                "type": "integer"
            },
            "body:value": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "changed": {
                "type": "date",
                "format": "epoch_second"
            },
            "created": {
                "type": "date",
                "format": "epoch_second"
            },
            "id": {
                "type": "keyword"
            },
            "promote": {
                "type": "boolean"
            },
            "search_api_language": {
                "type": "keyword"
            },
            "sticky": {
                "type": "boolean"
            },
            "title": {
                "type": "text",
                "boost": 5,
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "type": {
                "type": "keyword"
            }
        }
    },
    "settings": {
        "index": {
            "number_of_shards": "1",
            "analysis": {
                "filter": {
                    "stop": {
                        "type": "stop",
                        "stopwords": [
                            "i",
                            "me",
                            "my",
                            "myself"
                        ]
                    },
                    "synonym": {
                        "type": "synonym",
                        "lenient": "true",
                        "synonyms": [
                            "P-Card, P Card => P-Card",
                            "limousinetesting => limousine"
                        ]
                    }
                },
                "analyzer": {
                    "default": {
                        "type": "custom",
                        "filter": [ --> Notice the change in filters to filter 
                            "lowercase",
                            "stop",
                            "synonym"
                        ],
                        "tokenizer": "standard"
                    }
                }
            },
            "number_of_replicas": "1"
        }
    }
}

Now when I created the index with the above mapping and hit the analyze API with your text, I get its synonym token limousine As shown in the below output.

{
    "tokens": [
        {
            "token": "limousine",
            "start_offset": 0,
            "end_offset": 16,
            "type": "SYNONYM",
            "position": 0
        }
    ]
}
Amit
  • 30,756
  • 6
  • 57
  • 88