0

I am using elastic search for full text search in a Django application. I am using the elastic_dsl library from pypi to interface with the cluster. I am trying to implement a shingle filter in the analyzer. I believe I have gotten it to work with default values:

from elasticsearch_dsl import analyzer, tokenizer


main_analyzer = analyzer(
    'main_analyzer',
    tokenizer="standard",
    filter=[
        "lowercase",
        "stop",
        "porter_stem",
        "shingle"
        ]
    )

I would like to change the defaults. Eg, set max_shingle_size to 5 instead of the default 2. I cannot find the syntax for doing this. I have read the documentation, the examples in the Git repository, and some of the source code.

Neil
  • 3,020
  • 4
  • 25
  • 48

1 Answers1

0

You need to define a custom token filter and use it in your custom analyzer:

from elasticsearch_dsl import analysis

main_analyzer = analysis.analyzer(
    "main_analyzer",
    tokenizer="standard",
    filter=[
        "lowercase",
        "stop",
        "porter_stem",
        analysis.token_filter("my_shingle", "shingle", max_shingle_size=5)
    ]
)
Val
  • 207,596
  • 13
  • 358
  • 360