0

If I have two strings:

  • Doe, Joe
  • Doe, Jonathan

I want to implement a search such that:

  • "Doe" > "Doe, Joe", "Doe, Jonathan"
  • "Doe J" > "Doe, Joe", "Doe, Jonathan"
  • "Jon Doe" > "Doe, Jonathan"
  • "Jona Do" > "Doe, Jonathan"

Here's the code that I have:

settings analysis: {
    filter: {
      nameNGram: {
        type: "edgeNGram",
        min_gram: 1,
        max_gram: 20,
      }
    },
    tokenizer: {
      non_word: {
        type: "pattern",
        pattern: "[^\\w]+"
      }
    },
    analyzer: {
      name_analyzer: {
        type: "custom",
        tokenizer: "non_word",
        filter: ["lowercase", "nameNGram"]
      },
    }
  } do
  mapping do
    indexes :name, type: "multi_field", fields: {
      analyzed:   { type: "string", index: :analyzed, index_analyzer: "name_analyzer" }, # for indexing
      unanalyzed: { type: "string", index: :not_analyzed, :include_in_all => false } # for sorting
    }
  end
end

def self.search(params)
  tire.search(:page => params[:page], :per_page => 20) do
    query do
      string "name.analyzed:" + params[:query], default_operator: "AND"
    end
    sort do
      by "name.unanalyzed", "asc"
    end
  end
end

Unfortunately, this doesn't appear to be working... The tokenizing looks great, for "Doe, Jonathan" I get something like "d", "do", "doe", "j", "jo", "jon", "jona" etc. but if I search for "do AND jo", I get back nothing. If I, however, search for "jona", I get back "Doe, Jonathan." What am I doing wrong?

zilla
  • 909
  • 1
  • 11
  • 17

1 Answers1

0

You should likely only be using EdgeNGram if you want to create an autocomplete. I suspect that you want to use a pattern filter to separate words my commas.

Something like this:

"tokenizer": {
    "comma_pattern_token": {
         "type": "pattern",
         "pattern": ",",
         "group": -1
     }
 }

If I am mistaken and you need edgeNGrams for some other reason then your problem is that your index analyzer is ignoring stop words (such as the word AND) and your search analyzer is not. You need to create a custom analyzer for your search_analyzer that does not include the stop word filter.

Commander
  • 1,322
  • 2
  • 13
  • 29