1

I've spent around a week tinkering with Elasticsearch. I'm trying to create a search query that will enable substring search ('kua lum' => 'kuala lumpur') and fuzzy search ('koala lumpur' => 'kuala lumpur') on all fields of the documents. So far I've learned that you use multi_match for a multi-field fuzzy search and you use wildcard for a substring search (can't use nGram because it would break fuzzy search), but there's literally no information on how to combine them.

Yesterday I gave a try to Algolia and it did everything I needed right out of the box. Unfortunately, I'm working with sensitive data so I'm not allowed to host it outside the local infrastructure and even if Algolia did offer on-premise I'm afraid it would be too pricey for my banana republic to afford.

So I guess I'm stuck with Elasticsearch. Is it possible to make it do what I want it to do? I'm also free to try other search engines.

upd: tried MeiliSearch, works out of the box

1 Answers1

3

Elastic provides extremely flexible full-text search capabilities.

There could be multiple ways of achieving it. If you know your search base beforehand you can do with synonyms filter.

Else you can always combine two queries in one as a 'should' boolean query.

Or a query like this will also give back 'Kuala Lumpur' when you search 'kual lump', though the score will be much less

    {
      "query": {
          "multi_match": {
            "fields": [
              "city"
            ],
            "query": "kual lum",
            "type": "best_fields",
            "operator": "or",
            "fuzziness": "AUTO"
          }
        }
    }

Now, you can play with the fuzziness factor to suit your need (try making it 2, to get what you want), it does the magic but mindful of the factor as it might affect your search performance.

You should avoid Wildcards as they are quite resource-heavy.

Another way could be to treat each word as a separate search term and pass them on to multiple 'should' queries.

Zaid Warsi
  • 421
  • 3
  • 12
  • 1
    "try making it 4 or 5" - ES supports fuzziness of 0,1,2 only (https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#fuzziness) – Sahil Gupta Aug 21 '20 at 12:12
  • Thanks, @SahilGupta for pointing that out, I was on to a pretty older version. – Zaid Warsi Aug 21 '20 at 12:14
  • Thanks for your answer. This was the first thing I tried, but in no way this is a true prefix matching. It uses default fuzziness rules so it only starts working when len-2 characters are spelled already. Imagine if the city was called 'kualalalalala lumpur'. I want it to still be found using the 'kua lum' query. I need _either_ prefix matching or fuzzy search, prefix can not be misspelled. Could you please give an example of a 'should' query? I tried it myself, but didn't get it to work with 'multi_match' and 'wildcard'. – intelligentpotato Aug 21 '20 at 13:54
  • @intelligentpotato So any text field in ES gets analyzed and tokenized before getting stored in an inverted index. When you store "Kuala Lumpur" in your index, it gets tokenized to "Kuala" and "Lumpur". When you search for "Kuala Lumpur", both the words get searched across your inverted index. I updated my answer to having an or'd response across the words in the search string. The 'or' operator will make sure if any of the word matches, you get a response. Hope that helps. – Zaid Warsi Aug 21 '20 at 14:02
  • yep, I used "operator": "and". Still works the same way: len-2 only for each word and words can be looong :( – intelligentpotato Aug 21 '20 at 14:09
  • 1
    @intelligentpotato Check the edited answer, "opreator":"and" works like a logical AND on the words. You need to use "operator":"or". – Zaid Warsi Aug 21 '20 at 14:12