2

I have created a category index with completion suggesting and it is not behaving how I would expect.

curl -XPUT http://localhost:9200/categories/category/_mapping -d '{
    "category" : {
        "properties" : {
            "categoryDescription" : {
                "type" : "string"
            },
            "suggest" : {
                "type" : "completion",
                "analyzer" : "simple",
                "search_analyzer" : "simple",
                "payloads" : true
            }
        }
    }
}'

I have a category indexed for "Mexican Grocery Store" and when I search for that string I am getting zero hits and only a suggest result:

{
    "query":{
        "fuzzy":{
            "categoryDescription":{
                "value":"mexican grocery store"
            }
        }
    },
    "from":0,
    "size":20,
    "suggest":{
        "category-suggest":{
            "text":"mexican grocery store",
            "completion":{
                "field":"suggest","fuzzy":{"fuzziness":2}
            }
        }
    }
}

{
    "took":19,
    "timed_out":false,
    "_shards":{"total":5,"successful":5,"failed":0},
    "hits":{
        "total":0,"max_score":null,"hits":[]
    },
    "suggest":{
        "category-suggest":[
            {
                "text":"mexican grocery store",
                "offset":0,
                "length":21,
                "options":[
                    {
                        "text":"Mexican Grocery Store",
                        "score":1.0,
                        "payload":{"id":5915028960051200}
                    }
                ]
            }
        ]
    }
}

Not only am I getting zero hits for an exact match but when I type in the string "Mexican" a bunch of categories with the word "Medical" in it are listed before the "Mexican" categories which doesn't make any sense to me either.

{
    "query":{
        "fuzzy":{
            "categoryDescription":{
                "value":"mexican"
            }
        }
    },
    "from":0,
    "size":20,
    "suggest":{
        "category-suggest":{
            "text":"mexican",
            "completion":{
                "field":"suggest","fuzzy":{"fuzziness":2}
            }
        }
    }
}

{
    "took":11,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "failed":0
    },
    "hits":{
        "total":25,
        "max_score":3.8085938,
        "hits":[
            {
                "_index":"categories",
                "_type":"category",
                "_id":"4993638215974912",
                "_score":3.8085938,
                "_source":{
                    "id":4993638215974912,
                    "categoryDescription":"Medical Spa",
                    "suggest":{
                        "input":["Medical Spa"],
                        "output":"Medical Spa",
                        "payload":{"id":4993638215974912}}}},
{"_index":"categories","_type":"category","_id":"6401013099528192","_score":3.8085938,"_source":{"id":6401013099528192,"categoryDescription":"Medical School","suggest":{"input":["Medical School"],"output":"Medical School","payload":{"id":6401013099528192}}}},{"_index":"categories","_type":"category","_id":"4712163239264256","_score":3.4429123,"_source":{"id":4712163239264256,"categoryDescription":"Medical Examiner","suggest":{"input":["Medical Examiner"],"output":"Medical Examiner","payload":{"id":4712163239264256}}}},{"_index":"categories","_type":"category","_id":"5978800634462208","_score":3.4429123,"_source":{"id":5978800634462208,"categoryDescription":"Medical Center","suggest":{"input":["Medical Center"],"output":"Medical Center","payload":{"id":5978800634462208}}}},{"_index":"categories","_type":"category","_id":"5415850681040896","_score":3.4429123,"_source":{"id":5415850681040896,"categoryDescription":"Medical Clinic","suggest":{"input":["Medical Clinic"],"output":"Medical Clinic","payload":{"id":5415850681040896}}}},{"_index":"categories","_type":"category","_id":"4852900727619584","_score":2.75433,"_source":{"id":4852900727619584,"categoryDescription":"Medical Billing Service","suggest":{"input":["Medical Billing Service"],"output":"Medical Billing Service","payload":{"id":4852900727619584}}}},{"_index":"categories","_type":"category","_id":"5352079006629888","_score":2.4411354,"_source":{"id":5352079006629888,"categoryDescription":"Mexican Restaurant","suggest":{"input":["Mexican Restaurant"],"output":"Mexican Restaurant","payload":{"id":5352079006629888}}}},{"_index":"categories","_type":"category","_id":"5915028960051200","_score":2.143557,"_source":{"id":5915028960051200,"categoryDescription":"Mexican Grocery Store","suggest":{"input":["Mexican Grocery Store","shop"],"output":"Mexican Grocery Store","payload":{"id":5915028960051200}}}},{"_index":"categories","_type":"category","_id":"6392217006505984","_score":2.0527549,"_source":{"id":6392217006505984,"categoryDescription":"Latin American Restaurant","suggest":{"input":["Latin American Restaurant"],"output":"Latin American Restaurant","payload":{"id":6392217006505984}}}},{"_index":"categories","_type":"category","_id":"5149768867119104","_score":2.0527549,"_source":{"id":5149768867119104,"categoryDescription":"Occupational Medical Physician","suggest":{"input":["Occupational Medical Physician"],"output":"Occupational Medical Physician","payload":{"id":5149768867119104}}}},{"_index":"categories","_type":"category","_id":"5157465448513536","_score":2.0527549,"_source":{"id":5157465448513536,"categoryDescription":"Central American Restaurant","suggest":{"input":["Central American Restaurant"],"output":"Central American Restaurant","payload":{"id":5157465448513536}}}},{"_index":"categories","_type":"category","_id":"6479078425100288","_score":2.0527549,"_source":{"id":6479078425100288,"categoryDescription":"American Football Field","suggest":{"input":["American Football Field"],"output":"American Football Field","payload":{"id":6479078425100288}}}},{"_index":"categories","_type":"category","_id":"4789129053208576","_score":1.9529084,"_source":{"id":4789129053208576,"categoryDescription":"Mexican Goods Store","suggest":{"input":["Mexican Goods Store","shop"],"output":"Mexican Goods Store","payload":{"id":4789129053208576}}}},{"_index":"categories","_type":"category","_id":"5275113192685568","_score":1.9138902,"_source":{"id":5275113192685568,"categoryDescription":"Medical Laboratory","suggest":{"input":["Medical Laboratory"],"output":"Medical Laboratory","payload":{"id":5275113192685568}}}},{"_index":"categories","_type":"category","_id":"5838063146106880","_score":1.7436681,"_source":{"id":5838063146106880,"categoryDescription":"Medical Group","suggest":{"input":["Medical Group"],"output":"Medical Group","payload":{"id":5838063146106880}}}},{"_index":"categories","_type":"category","_id":"4649491076481024","_score":1.7436681,"_source":{"id":4649491076481024,"categoryDescription":"American Restaurant","suggest":{"input":["American Restaurant"],"output":"American Restaurant","payload":{"id":4649491076481024}}}},{"_index":"categories","_type":"category","_id":"5458456756617216","_score":1.5311122,"_source":{"id":5458456756617216,"categoryDescription":"Traditional American Restaurant","suggest":{"input":["Traditional American Restaurant"],"output":"Traditional American Restaurant","payload":{"id":5458456756617216}}}},{"_index":"categories","_type":"category","_id":"6183309797228544","_score":1.5311122,"_source":{"id":6183309797228544,"categoryDescription":"Public Medical Center","suggest":{"input":["Public Medical Center"],"output":"Public Medical Center","payload":{"id":6183309797228544}}}},{"_index":"categories","_type":"category","_id":"6706677332049920","_score":1.5311122,"_source":{"id":6706677332049920,"categoryDescription":"Native American Goods Store","suggest":{"input":["Native American Goods Store","shop"],"output":"Native American Goods Store","payload":{"id":6706677332049920}}}},{"_index":"categories","_type":"category","_id":"6119538122817536","_score":1.3949344,"_source":{"id":6119538122817536,"categoryDescription":"Medical Supply Store","suggest":{"input":["Medical Supply Store","shop"],"output":"Medical Supply Store","payload":{"id":6119538122817536}}}}]},"suggest":{"category-suggest":[{"text":"mexican","offset":0,"length":7,"options":[{"text":"Medical Billing Service","score":1.0,"payload":{"id":4852900727619584}},{"text":"Medical Center","score":1.0,"payload":{"id":5978800634462208}},{"text":"Medical Clinic","score":1.0,"payload":{"id":5415850681040896}},{"text":"Medical Examiner","score":1.0,"payload":{"id":4712163239264256}},{"text":"Medical Group","score":1.0,"payload":{"id":5838063146106880}}]}]}}
Robert Garcia
  • 416
  • 3
  • 16

1 Answers1

6

You index the field categoryDescription as string, so Elasticsearch is running its standard analyzer on your input and turns Mexican Grocery Store into three tokens, [mexican, grocery, store].

The fuzzy query belongs to the family of term queries, that is, it operates on a term level and doesn't run through any analyzers. A fuzzy query with the input Mexican Grocery Store will try to match these words as one term, not as 3 different ones. It doesn't find anything, since the complete phrase doesn't exist as one term in the index. You could add a subfield to categoryDescription that is not analyzed or uses just a lowercase token filter and run the fuzzy query on this field to product the "exact match".

For the second part, the fuzzy query does not differentiate between matches that were modified (where fuzziness was applied) and exact matches. Before the actual search is executed, the fuzzy term is internally matched against a list of all terms in the given field and expanded. In your example, it turns into something like

"boolean": {
  "should": [
    {
      "term": {
        "categoryDescription": "medical"
      }
    },
    {
      "term": {
        "categoryDescription": "mexican"
      }
    }
  ]
}

From this, it's clear why things like Medical Spa are returned at all. Those hits also have a higher score than Mexican Grocery Store, so they are returned first. I suspect this is due to term frequencies (Medical appears more often than Mexican), but should run the query again with explain enabled to see exactly why the score is higher.

If you want to apply a penalty to fuzzy matches, you can wrap a fuzzy and a term query into a boolean query:

{
  "query": {
    "boolean": {
      "should": [
        {
          "fuzzy": {
            "categoryDescription": "mexican"
          }
        },
        {
          "term": {
            "categoryDescription": "mexican"
          }
        }
      ]
    }
  }
}

This will cut the score of documents where only the fuzzy part matches in half (due to the coord factor of boolean queries).

knutwalker
  • 5,924
  • 2
  • 22
  • 29
  • Oh wow! What a great explanation. Thank you so much :) The explanation of how the fuzzy query works really helps me better understand how all of this works. Is there a better setup from the get-go for the index that I could do that would help achieve what I am expecting? – Robert Garcia May 16 '16 at 17:40
  • I had tried using an nGram analyzer at one point but was running into problems where I had a category Bar and other categories with the word Bar in them like Barn for instance and when the user typed in "Bar" words with Bar in it were appearing first in the list and again wouldn't just show Bar, an exact match, first. – Robert Garcia May 16 '16 at 17:47
  • nGram has the similar issue; it produces the same terms for `Bar` and `Barn`, making it impossible to differentiate between those two. Here is a play fiddle for the two suggestions from my answer: https://www.found.no/play/gist/dba4dec7672146946de9c80fb8656e7c – If you want to know how you can achieve what you want, it's better to ask a new question where you describe your goals and ask directly this question. – knutwalker May 16 '16 at 19:01
  • Okay cool. Thanks for the play fiddle link! I will implement the changes and see how it plays out. If it still isn't what I am looking for I will. Thank you again knutwalker! :) – Robert Garcia May 16 '16 at 19:06
  • So I implemented both the different kind of queries you suggested. Thank you again for that! Questions after implementing them though. In query one why do you add '.lower" to categoryDescription and not in query two. Query two won't work as you explained with it attached I discovered. Query two is great because it is pulling all the categories with "Mexican" in it before medical just like you said but for exact match the hits are empty again. Is there a combination of the two solutions? – Robert Garcia May 17 '16 at 01:03
  • Never mind! I am a dummy! I put the lower on just the fuzzy categoryDescription part of the query and not the term and it combines the two solutions :) Though still what does the lower do? Does that implement the analyzer? – Robert Garcia May 17 '16 at 02:10
  • The mapping defines a subfield with the name `lower` that's using a different analyzer (also named `lower`) which does *not* tokenize the input string. The first query uses `.lower` since the input consists of multiple tokens. In the second query, you can only match by using the tokenized fields. – knutwalker May 17 '16 at 11:46
  • Cool. Thank you for the reply :) Now I need to figure out how to allow "Mexican Store" to pull up "Mexican Grocery Store" and having dashes in the fields still pull results. But that is for another stackoverflow question if I can't get it. – Robert Garcia May 17 '16 at 17:17
  • Hi again! If you had any time and this was something plausible could you check out this thread? If not no worries :) You have been such a great help already! http://stackoverflow.com/questions/37327161/elasticsearch-fuzzy-phrase-completion-suggestor-and-dashes – Robert Garcia Jun 29 '16 at 02:22