8

I'm using the fuzzy search option in ElasticSearch. It's pretty cool.

But I came across an issue when doing search for values that have spaces. For example say I have two values:

"Pizza"
"Pineapple Pizza"

and I search for Pizza using this query:

        client.search({
            index: 'food_index',
            body: {
                query: {
                    fuzzy: {
                        name: {
                            value: "Pizza",
                            transpositions: true,
                        }
                    },
                }
            }
        })

The values returned are:

"Pizza"
"Pineapple Pizza"

Which is expected. But if I enter in the value "Pineapple Pizza" in my query:

        client.search({
            index: 'food_index',
            body: {
                query: {
                    fuzzy: {
                        name: {
                            value: "Pineapple Pizza",
                            transpositions: true,
                        }
                    },
                }
            }
        })

The values returned are:

""

Empty

Why is that? It should be an exact match. I'm contemplating switching all names that have spaces in them to underscores. So "Pineapple Pizza" would be "Pineapple_Pizza" (This solution works for me). But I'm asking this question as to hopefully finding a better alternative. What am I doing wrong here?

JD333
  • 501
  • 8
  • 22

1 Answers1

10

Fuzzy queries are term level queries. It means searched text is not analyzed before matching the documents. In your case standard analyzer is used on field name, which splits "Pineapple Pizza" in two tokens Pineapple and pizza. Fuzzy query is trying to match search text "Pineapple pizza" to any similar term in index and there is no entry in index for the whole word pineapple pizza(it is broken in two words.)

You need to use match query with fuzziness set to analyze query string

{
  "query": {
        "match" : {
            "item" : {
                "query" : "Pineappl piz",
                "fuzziness": "auto"
            }
        }
    }
}

Response :

 [
      {
        "_index" : "index27",
        "_type" : "_doc",
        "_id" : "p9qQDG4BLLIhDvFGnTMX",
        "_score" : 0.53372335,
        "_source" : {
          "item" : "Pineapple Pizza"
        }
      }
    ]

You can also use fuzziness on keyword field which stores entire text in index

{
  "query": {
    "fuzzy": {
      "item.keyword": {
        "value":"Pineapple pizz"
      }
    }
  }
}

EDIT1:

{
  "query": {
        "match" : {
            "item" : {
                "query" : "Pineapple pizza",
                "operator": "and",
                "fuzziness": "auto"
            }
        }
    }
}

"operator": "and" --> all the tokens in query must be present in document. Default is OR , if any one token is present document is present. There are other possible combinations where you can define how many tokens should match in percent term

jaspreet chahal
  • 8,817
  • 2
  • 11
  • 29
  • I appreciate your response. And this sort of fixed my issue. But when searching pineapple pizza I also get pizza. What if I only want pineapple pizza? – JD333 Oct 28 '19 at 01:22
  • I tried adding a third pizza Pepperoni pizza. When searching pineapple pizza I also get Pepperoni Pizza :( – JD333 Oct 28 '19 at 01:32
  • Use operator and in match query. It will mean both tokens are required , default is or. Are you using match fuzzy on keyword? – jaspreet chahal Oct 28 '19 at 01:36
  • I am using match query not the one with keyword. I will try it with the and operator – JD333 Oct 28 '19 at 02:04
  • I have added an EDIT with operator keyword. Let me know how it goes – jaspreet chahal Oct 28 '19 at 02:06
  • Dude huge help. Thank you. What do you mean by "There are other possible combinatsion where you can define how many tokens should match in percent term." Tokens are words, or split text, but I'm not sure what percent term is. – JD333 Oct 28 '19 at 02:09
  • Suppose there are 3 tokens in searched text "The ppineapple pizza", you can give match percent as 75% so if any document has any two of the above words , it wiill be returned – jaspreet chahal Oct 28 '19 at 02:13
  • Sorry, I'm having trouble finding which parameter allows for that – JD333 Oct 28 '19 at 02:22
  • minimum_should_match .Link for match query https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html – jaspreet chahal Oct 28 '19 at 02:24
  • Thank you very much. Do you work with ES at work all the time? – JD333 Oct 28 '19 at 02:28
  • 1
    Glad could be of help, I have some experience at work with it. – jaspreet chahal Oct 28 '19 at 02:29