4

I'm trying to get user submitted queries for "Joe Frankles", "Joe Frankle", "Joe Frankle's" to match the original text "Joe Frankle's". Right now we're indexing the field this text is in with (Tire / Ruby Format):

{ :type => 'string', :analyzer => 'snowball' }

and searching with:

query { string downcased_query, :default_operator => 'AND' }

I tried this unsuccessfully:

          create :settings => {
              :analysis => {
                :char_filter => {
                   :remove_accents => {
                     :type => "mapping",
                     :mappings => ["`=>", "'=>"]
                   }
                },
                :analyzer => {
                  :myanalyzer => {
                    :type => 'custom',
                    :tokenizer => 'standard',
                    :char_filter => ['remove_accents'],
                    :filter => ['standard', 'lowercase', 'stop', 'snowball', 'ngram']
                  }
                },
                :default => {
                  :type => 'myanalyzer'
                }
            }
          },
LMH
  • 949
  • 9
  • 22

3 Answers3

4

There's two official ways of handling possessive apostrophes:

1) Use the "possessive_english" stemmer as described in the ES docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html

Example:

{
  "index" : {
    "analysis" : {
        "analyzer" : {
            "my_analyzer" : {
                "tokenizer" : "standard",
                "filter" : ["standard", "lowercase", "my_stemmer"]
            }
        },
        "filter" : {
            "my_stemmer" : {
                "type" : "stemmer",
                "name" : "possessive_english"
            }
        }
    }
  }
}

Use other stemmers or snowball in addition to the "possessive_english" filter if you like. Should/Must work, but it's untested code.

2) Use the "word_delimiter" filter:

{
  "index" : {
    "analysis" : {
        "analyzer" : {
            "my_analyzer" : {
                "tokenizer" : "standard",
                "filter" : ["standard", "lowercase", "my_word_delimiter"]
            }
        },
        "filter" : {
            "my_word_delimiter" : {
                "type" : "word_delimiter",
                "preserve_original": "true"
            }
        }
    }
  }
}

Works for me :-) ES docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html

Both will cut off "'s".

Simon Steinberger
  • 6,605
  • 5
  • 55
  • 97
1

I ran into a similar problem, the snowball analyzer alone didn't work for me. Don't know if it's supposed to or not. Here's what I use:

properties: {
  name: {
    boost: 10,
    type:  'multi_field',
    fields: {
      name:      { type: 'string', index: 'analyzed', analyzer: 'title_analyzer' },
      untouched: { type: 'string', index: 'not_analyzed' }
    }
  }
}

analysis: {
  char_filter: {
    remove_accents: {
      type: "mapping",
      mappings: ["`=>", "'=>"]
    }
  },
  filter: {},
  analyzer: {
    title_analyzer: {
      type: 'custom',
      tokenizer: 'standard',
      char_filter: ['remove_accents'],
    }
  }
}

The Admin indices analyze tool is also great when working with analyzers.

Neil
  • 24,551
  • 15
  • 60
  • 81
Yeggeps
  • 2,055
  • 2
  • 25
  • 34
  • Really interesting. This looks like it just removes apostrophes so "Joe Frankles" would match "Joe Frankle's" but would "Joe Frankle" match "Joe Frankle's" with the above? – LMH Apr 25 '13 at 21:25
  • Yes, for me it does, I'm not 100% sure why, but this was the only way I could get it to work. I'm using swedish snowball btw, not sure if that matters. – Yeggeps Apr 26 '13 at 10:04
  • Thanks, I gave this a try but it doesnt seem to be working. I updated the question above with syntax. Any brilliant ideas? – LMH Apr 26 '13 at 15:36
  • It's not clear from your example if you did or not, but you would have to remove the custom filters I've got in my example. – Yeggeps Apr 26 '13 at 16:54
  • I did this, is this what you are referring to: :filter => ['standard', 'lowercase', 'stop', 'snowball', 'ngram'] – LMH Apr 26 '13 at 20:18
  • 1
    Yep, I've updated my answer with what I'm using. I also think you should be using a [text](http://www.elasticsearch.org/guide/reference/glossary/#text) or [term](http://www.elasticsearch.org/guide/reference/glossary/#term) query since I don't think that query_string is analyzed? – Yeggeps Apr 27 '13 at 11:05
0

It looks like in your query you are searching _all field, but your analyzer is applied only to the individual field. To enable this functionality for the _all field, simply make snowball your default analyzer.

Community
  • 1
  • 1
imotov
  • 28,277
  • 3
  • 90
  • 82