1

I'm new to ElasticSearch but have been reading on it for the last couple days trying to come up with 'best' type of search my application. I want to be able to match multiple terms with multiple results but also have partial word results. Listed below is what I am currently using and it works great finding results for all the words entered such as 'Michigan Creative VP' finds people who work for Michigan Creative and VP Comm, but when I search for 'manage' instead of 'management' nothing comes up.

$params =
            [
                'index' => 'myindex',
                'type' => 'person',
                'body' =>
                    [
                        'from' => 0,
                        'size' => 500,
                        'query' =>
                            [
                                'fuzzy_like_this' =>
                                    [
                                        '_all' =>
                                            [
                                                'like_text' => $keywords,
                                                'fuzziness' => 0.5,
                                            ],
                                    ],
                            ],
                    ]
            ];

I've read about wildcard but see that people say the results are slow and I am not sure they also take in account every word in the search. Can someone please help point me to the right search configuration that could get partial matches.

Doug T.
  • 64,223
  • 27
  • 138
  • 202
Wally Kolcz
  • 1,604
  • 3
  • 24
  • 45
  • 2
    In addition to Doug's crash course on relevance, if you're trying to build an autosuggest/typeahead feature, have a look at Elasticsearch Suggesters, in particular, the Phrase Suggester: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html which can do a lot of what you're asking. – Peter Dixon-Moses Sep 27 '15 at 19:05

1 Answers1

2

Couple of thoughts

  1. The fuzzy_like_this query is built for more-like this. Basically it's typically used for in-content recommendations. Is this what you're doing? For more typical search, I would expect to see match or multi_match queries. Have you tried these out?

  2. It appears the text is getting tokenized, and fuzzy_like_this is matching on the exact tokens as traditional search would. For example

    Michigan Creative VP' finds people who work for Michigan Creative and VP Comm, but when I search for 'manage' instead of 'management' nothing comes up.

    seems to indicate to me that documents contain the terms [vp] and [comm] and your query matches because [vp] from the query is a match. The other document matches because [michigan] and [creative] are exact matches

  3. Your fuzziness doesn't seemt high enough to match query manage to management. Examining the formula here you ought, you can calculate how many allowable edits let the term into the search results:

    length(term) * (1.0 - fuzziness)

    which in this case means

    length(manage) * 0.5 == 6 * 0.5 or 3

    which seems to allow edits of up to 3 characters, and management adds 4 characters

Some bigger picture pointers:

  • Searching through just an edit distance might not be working towards the search engine's core strength. The search engine is going to be better used when you take text and normalize it down to tokens using the analysis process. I might suggest reading this post as a primer. We also talk about this at length in chapter 4 of my book, Relevant Search

  • Once you understand analysis, a better solution to the general problem of matching management to manage might be stemming which reduces terms to their root form before trying to match.

  • Based on how you think about your search matching rules, it sounds like you might want to setup test cases, and use a test driven approach to your search.

Doug T.
  • 64,223
  • 27
  • 138
  • 202
  • Wow, thanks for all the information. I have to admit that I am very new at this and you gave me a lot of good information to sift through and try to figure out what i need. Basically I have pushed a bunch of user information to an index including first_name, last name, username, email address, department, job title, bio, and phone number that i would like the user to be able to type whatever they want into the search box and have elastic search try to match up with the some of all of the values of those document fields. – Wally Kolcz Sep 27 '15 at 17:23
  • I am trying to figure out how to replace multi_match into the config params that i posted above so I can test it. – Wally Kolcz Sep 27 '15 at 17:23
  • Glad to help! I hate to sound like a shameless-plugger :) but Chapters 5 and 6 of Relevant search probably cover multi_match in more depth than anywhere. And it happens to be today's Manning deal of the day https://twitter.com/ManningBooks/status/648135072961335296 – Doug T. Sep 27 '15 at 17:25
  • Oh I am definitely going to buy the book this week when I get paid :) Could you give me a hint on what i need to replace in the params to make it multi_match instead of fuzzy search? I tried to replace the part that starts with 'fuzzy_like_this' and ends with 'fuzziness' and it's throwing an error – Wally Kolcz Sep 27 '15 at 18:10
  • might just try a plain match query: `"match"=> {"_all"=> $keywords}` – Doug T. Sep 27 '15 at 18:20
  • $params =['index' => 'my_index', 'type' => 'person', 'body' => ['from' => 0, 'size' => 500, 'query' => ["match"=> ["_all"=> 'manage'] ]]]; failed to find any results – Wally Kolcz Sep 27 '15 at 18:48
  • Tried $params =['index' => 'letsmeetup','type' => 'person','body' => ['from' => 0,'size' => 500, 'query' => ["multi_match" => ["query" => "manage","type" =>"best_fields","fields"=> [ "bio", "interest", "skills" ],"tie_breaker" => 0.3] ],]]; with 'manage' and it also failed to find any results but found results for exact match of 'php' – Wally Kolcz Sep 27 '15 at 19:01
  • Yeah unless "manage" exactly is in the search engine, that won't match. You'll need to figure out how to emit the root form from management. This is a matching/analysis prob not so much a query one – Doug T. Sep 27 '15 at 19:23