0

The question title is a bit misleading but I didn't know how to put it properly but here is my scenario

I have a word Water wipes (see the space in between) in the title of a product record in my elastic search. Now I need to match it for waterwipes. since there is no space in between I am getting zero results for the query waterwipes. The following is the must match query in the Elastic search (I am using PHP here)

 $mustConditions = [
   [
      "nested" => [
          "path"  => "name",
          "query" => [
             "multi_match" => [
                "query"            => (string)$query,
                "fields"           => ['name.en^3', 'name.ar^3'],
                "zero_terms_query" => "all",
                "fuzziness"        => "auto",
                "operator"         => "AND",
             ],
          ],   
        ],
      ],
   ];

And the analyzer for the field is 'english'. How do I match for words like Water Wipes for waterwipes

Amit
  • 30,756
  • 6
  • 57
  • 88
Shobi
  • 10,374
  • 6
  • 46
  • 82

1 Answers1

1

You need to erase the whitespace from the title of your product and index it and later on you can query on that word.

Please see the Index setting for removing the whitespace:{

    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "standard",
                    "char_filter": [
                        "replace_whitespace"
                    ]
                }
            },
            "char_filter": {
                "replace_whitespace": {
                    "type": "mapping",
                    "mappings": [
                        "\\u0020=>"
                    ]
                }
            }
        }
    }
}

After this, you can use ES analyze API to confirm its generating token which would match your search query tokens.

POST _analyze

{
    "text": "Water wipes",
    "analyzer" : "my_analyzer"
}

{
    "tokens": [
        {
            "token": "Waterwipes", --> Notice whitespace is removed
            "start_offset": 0,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

Suggestions: You should store these whitespace removed tokens in another field of title like titlewospaces and apply above custom analyzer on it and search on both the fields to get better results. Also, you should check Explain API to see what tokens your query generating and how it matches with indexed tokens.

Amit
  • 30,756
  • 6
  • 57
  • 88
  • Thank you for your answer, I will test this out and I have one question. If I create the custom analyzer and apply it on the field, will it return results for *water wipes* ? (with space?). So both *waterwipes* and *water wipes* will work? – Shobi Mar 02 '20 at 11:11
  • Also the title for the products are more like this *Water Wipes - Mega Value Pack - 12 x 60s Wipes* instead of simple *water wipes*. I guess your version removes all the whitespaces in the text ? – Shobi Mar 02 '20 at 11:13
  • @Shobi , yes it will remove white space from all the places – Amit Mar 02 '20 at 11:21
  • 1
    Hey, Sorry for the late reply. Actually What I ended up doing is that include the description in the index, and stuff *waterwipes* in the description so that elastic would include that result as well. Your answer was super helpful and I learned about the explain API (Thank you). – Shobi Mar 03 '20 at 11:09
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/208916/discussion-between-opster-elasticsearch-ninja-and-shobi). – Amit Mar 03 '20 at 11:34
  • I did upvote, but this is not an answer which can be accepted as correct answer. – Shobi Mar 05 '20 at 19:31