1

Fuzzy query in elastic search in not working, even with the exact value the results are empty.

ES Version: 7.6.2

Index Mapping: Below are the mapping details

{
  "movies" : {
    "mappings" : {
      "properties" : {
        "genre" : {
          "type" : "text",
          "fields" : {
            "field" : {
              "type" : "keyword"
            }
          }
        },
        "id" : {
          "type" : "long"
        },
        "rating" : {
          "type" : "double"
        },
        "title" : {
          "type" : "text"
        }
      }
    }
  }
}

Documents: Below documents are present in the index

    {
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "id" : 2,
          "title" : "Raju Ban gaya gentleman",
          "rating" : 2,
          "genre" : [
            "Drama"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "id" : 2,
          "title" : "Baat ban jaegi gentleman",
          "rating" : 4,
          "genre" : [
            "Drama"
          ]
        }
      }
    ]
  }
}

Query: Below is the query which i am using for searching the document

GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {"value": "Bat ban jaegi gentleman", "fuzziness": 1}
    }
  }
}

I haven't used fuzzy queries before and per my understanding it should work just fine.

Sahil Gupta
  • 2,028
  • 15
  • 22

1 Answers1

1

Fuzzy queries are not analyzed but the field is so your search for Bat ban jaegi gentleman will be divided into different terms and Bat will be analyzed and that term will be further used to filter down the result.

You can refer to this answer as well ElasticSearch's Fuzzy Query as to why fuzzy query analyze on field.

But since you want to analyze complete title, you can change your mapping of title to have keyword field as well.

You can see how exactly your string will be tokenized by analyze API: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html

Below is mapping for the same:

"mappings": {
        "properties": {
            "genre": {
                "type": "text",
                "fields": {
                    "field": {
                        "type": "keyword"
                    }
                }
            },
            "id": {
                "type": "long"
            },
            "rating": {
                "type": "double"
            },
            "title": {
                "type": "text",
                "fields": {
                    "field": {
                        "type": "keyword"
                    }
                }
            }
        }
    }

Now if you search on title.field you will get desired result. Search query is :

    {
  "query": {
    "fuzzy": {
      "title.field": {"value": "Bat ban jaegi gentleman", "fuzziness": 1}
    }
  }
}

Result obtained in this case is :

"hits": [
      {
        "_index": "ftestmovies",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.9381845,
        "_source": {
          "title": "Baat ban jaegi gentleman",
          "rating": 4,
          "genre": [
            "Drama"
          ]
        }
      }
    ]
  • 1
    Thanks Prerna for the answer. – Sahil Gupta Apr 02 '20 at 11:29
  • I am still not sure why the query doesn't results in an exception. If ES doesn't supports fuzzy search on 'text' fields exception should be triggered. – Sahil Gupta Apr 02 '20 at 11:51
  • 1
    ES analyzes text field as single tokens because of which fuzziness output is not what you expect. This is explained in this post http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html in much detail. You can refer the same. –  Apr 02 '20 at 11:55
  • 1
    For simple understanding earlier since your title field was "text" so it was getting divided into different token. Suppose for "Baat ban jaegi gentleman" it will get tokenized as Baat, ban, jaegi, gentleman. So, now if by fuzzy search you search "Bat ban jaegi gentleman" then this will get searched with each token differently and fuzzy distance will be calculated for each of them differently . For e.g. fuzziness(edit distance) between "Bat ban jaegi gentleman" and "ban" will be around 21. So, for none of the token fuzziness factor will be 1. –  Apr 02 '20 at 12:14
  • I observed that input is not analyzed as well. "raju" results in successful search while "Raju" doesn't with fuzziness = 0. Another interesting fact is fuzziness defaults to 2 if it is more than 2, per my observation. Same is mentioned in fuzziness section under https://www.elastic.co/guide/en/elasticsearch/reference/7.6/common-options.html#fuzziness – Sahil Gupta Apr 02 '20 at 13:09