3

Let's say I have 5 film titles:

  • Sans Soleil
  • Sansa
  • So Is This
  • Sol Goode
  • Sole Survivor

I want to implement an auto-complete search field with this expected behavior:

  • "Sans" > Sans Soleil, Sansa
  • "Sans so" > Sans Soleil
  • "So" > So Is This, Sol Goode, Sole Survivor
  • "So Is" > So Is This
  • "Sol" > Sol Goode, Sole Survivor, Sans Soleil

This use-case seems obvious and must be one utilized by many, but I just can't get it to work properly and I can't seem to find any answer or documentation to help. This is my current model:

class Film < Media
  include Tire::Model::Search
  include Tire::Model::Callbacks

  settings  :analysis => {
              :filter => {
                :title_ngram  => {
                  "type"      => "edgeNGram",
                  "min_gram"  => 2,
                  "max_gram"  => 8,
                  "side"      => "front" }
              },
              :analyzer => {
                :title_analyzer => {
                  "tokenizer"    => "lowercase",
                  "filter"       => ["title_ngram"],
                  "type"         => "custom" }
              }
            } do
    mapping do
      indexes :title, :type => 'string', :analyzer => 'title_analyzer'
      indexes :int_english_title, :type => 'string', :analyzer => 'title_analyzer'
    end
  end
end

And how the query is handled in my search_controller:

search = Tire.search ['books', 'films', 'shows'], :load => true, :page => 1, :per_page => 10 do |s|
    s.query do |query|
        query.string "title:#{params[:search]}"
    end
end
@results = search.results

This produces some strange behavior:

  • "Sans so" returns "Sansa, Sans Soleil, So Is This" in that order.
  • "So is" returns "Sol Goode, Sans Soleil, Sole Survivor, So Is This" in that order.
j0k
  • 22,600
  • 28
  • 79
  • 90
gibson
  • 33
  • 3
  • Different approach to the same problem in the latest [railscast(pro)](http://railscasts.com/episodes/399-autocomplete-search-terms) – Andreas Lyngstad Jan 02 '13 at 14:44
  • Interesting, have you seen the episode? If it solves my exact use-case, being that it's able to properly sort ngram hits on titles with multiple words, I might consider subscribing. – gibson Jan 02 '13 at 20:35
  • I have seen it it. It does not solve you exact problem, but uses a different approach. If you develop rails apps for money, railscasts it a huge timesaver and in my case I save the $9 the first day of every month. – Andreas Lyngstad Jan 04 '13 at 08:03

2 Answers2

4

I think you might achieve what you want with the match query set to type:"phrase_prefix". Most, but not all, of your examples would work.

With Ngrams, you have much finer control over the process, but they have a rather big recall (they usually return more data then you want), and you have to fight it. That's the "strange behaviour" you observe with multiple query terms ("Sans so"), because they are effectively executed as a Sans OR so query.

Try using the default_operator: "AND" option (see Tire's query_string_test.rb), or rather the match query (see Tire's match_query_test.rb) with the operator: "AND" option.

There are some articles about autocomplete, Tire and Ngrams available:

karmi
  • 14,059
  • 3
  • 33
  • 41
  • As you correctly point out, the handling of whitespace was the main issue. I had tried to use `default_operator: "AND"` without much success, but match with `type:"phrase_prefix"` seems to do the trick! Do you know why query_string with `AND` wouldn't work? I also tweaked the mapping of each index to utilize a separate `:index_analyzer` and `:search_analyzer`. In addition to the articles you linked I can also recommend reading this one, it thoroughly breaks down the search process and made things a bit clearer: http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch – gibson Jan 03 '13 at 11:11
  • With the "AND" operator, I think the "sans so" query shouldn't return "So Is This" -- because the "sans" part does not break into any ngram in "So Is This", unless I'm mistaken. – karmi Jan 03 '13 at 15:25
  • The `phrase_prefix` type for `match` should actually be quite good for simple autocompletion. Of course, as noted, with Ngrams, you gain much more flexibility and higher recall. – karmi Jan 03 '13 at 15:28
  • Splitting the `index_` and `search_` analyzer is a good idea, because your query then won't be tokenized into ngrams -- which is what you want, the query a person does on your site is already "pseudo-ngrammed", because she writes only "partial words". – karmi Jan 03 '13 at 15:30
0

Try following

search = Tire.search ['books', 'films', 'shows'], :load => true, :page => 1, :per_page => 10 do |s|
      s.query do |q|
        q.boolean do |b|
          b.must {|m| m.string params[:search]} 
        end
      end
end
Salil
  • 46,566
  • 21
  • 122
  • 156
  • Not specifying "title:" in the string searches the _all field which bypasses my edgeNgram analyzer, so searching for "Sol" will only return "Sol Goode". I tried adding "title:#params[:search]" to your block but sadly it keeps returning sub-optimal hits. – gibson Jan 02 '13 at 10:42
  • have you tried `b.must{|m| m.string "title:#{params[:search]}"}` – Salil Jan 02 '13 at 10:44
  • Yes, when I try that I get the same results as before. – gibson Jan 02 '13 at 10:50
  • @Salil No need to wrap the query in the boolean query -- it makes no difference. Also, as @gibson notes, not specifying the `title:` query qualifier will produce totally incorrect results. – karmi Jan 03 '13 at 08:46