1

How do I add the following german_phonebook analyzer to elastic search using elastic4s?

        "index": {
            "analysis": {
                "analyzer": {
                    "german": {
                        "filter": [
                            "lowercase",
                            "german_stop",
                            "german_normalization",
                            "german_stemmer"
                        ],
                        "tokenizer": "standard"
                    },
                    "german_phonebook": {
                        "filter": [
                            "german_phonebook"
                        ],
                        "tokenizer": "keyword"
                    },
                    "mySynonyms": {
                        "filter": [
                            "lowercase",
                            "mySynonymFilter"
                        ],
                        "tokenizer": "standard"
                    }
                },
                "filter": {
                    "german_phonebook": {
                        "country": "CH",
                        "language": "de",
                        "type": "icu_collation",
                        "variant": "@collation=phonebook"
                    },
                    "german_stemmer": {
                        "language": "light_german",
                        "type": "stemmer"
                    },
                    "german_stop": {
                        "stopwords": "_german",
                        "type": "stop"
                    },
                    "mySynonymFilter": {
                        "synonyms": [
                            "swisslift,lift"
                        ],
                        "type": "synonym"
                    }
                }
            },

The core question here is which filter to use for the german_phonebook filter of type icu_collation?

...

Following the answer I came up with this code:

  case class GPhonebook() extends TokenFilterDefinition {
    val filterType = "phonebook"
    def name = "german_phonebook"
    override def build(source: XContentBuilder): Unit = {
      source.field("tokenizer", "keyword")
      source.field("country", "CH")
      source.field("language", "de")
      source.field("type", "icu_collation")
      source.field("variant", "@collation=phonebook")  
    }
  }

The analyzer definition looks like this now:

  CustomAnalyzerDefinition(
      "german_phonebook",
      KeywordTokenizer("myKeywordTokenizer2"),
      GPhonebook()
  )
TeTeT
  • 2,044
  • 20
  • 30
  • Can you post what you already tried? – Onilton Maciel Feb 23 '16 at 15:53
  • Here's the current code: http://pastebin.com/i1FYNyUH I can create an index and setup the synonym analyzer. I don't know how to proceed here with the german_phonebook filter in the analyzer definition. – TeTeT Feb 24 '16 at 15:01

1 Answers1

1

What you really want is someway to say

CustomTokenFilter("german_phonebook) or BuiltInTokenFilter("german_phonebook") but you can't (I'll add that).

So for now, you need to extend TokenFilterDefinition.

Eg, Something like

case class GPhonebook extends TokenFilterDefinition {
  val filterType = "phonebook"
  override def build(source: XContentBuilder): Unit = {
    // set extra params in here
  }
}
sksamuel
  • 16,154
  • 8
  • 60
  • 108