2

I'm using serializer to get fields, and stempel plugin for Polish language search for elasticSearch. Trying to get something like in this example, but without success:

https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html#asciifolding-token-filter

It's my config:

fos_elastica:
    serializer: ~
    clients:
        default: { host: 127.0.0.1, port: 9200 }
    indexes:
        bpo:
            settings:
                index:
                    analysis:
                        analyzer:
                            folding:
                                tokenizer: standard
                                filter: [standard, lowercase, asciifolding, polish_stem]
        types:
            company:
                properties:
                    name:
                        type: string
                        analyzer: standard
                        fields:
                            folded:
                                type: string
                                analyzer: folding
                serializer:
                    groups: [elastica]
                    version: '1.1'
                    serialize_null: true
                persistence:
                    driver: orm
                    model: AppBundle\Entity\Company
                    repository: AppBundle\Repository\CompanyRepository
                    provider: ~
                    finder: ~

And then check:

$ curl "127.0.0.1:9200/bpo/_analyze?analyzer=folding&text=spółka&pretty"
{
  "tokens" : [ {
    "token" : "spᅢ뺴",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "<ALPHANUM>",
    "position" : 0
  }, {
    "token" : "ツ",
    "start_offset" : 5,
    "end_offset" : 6,
    "type" : "<KATAKANA>",
    "position" : 1
  }, {
    "token" : "ka",
    "start_offset" : 6,
    "end_offset" : 8,
    "type" : "<ALPHANUM>",
    "position" : 2
  } ]
}

Even when trying to get ß ⇒ ss

$ curl "127.0.0.1:9200/bpo/_analyze?analyzer=folding&text=ß&pretty"
{
  "tokens" : [ {
    "token" : "ᅢ゚",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "<HANGUL>",
    "position" : 0
  } ]
}

When I trying to get from browser some response - "spółka" gets me correct data, but "spolka" return nothing.

I need filter, or something?

1 Answers1

2

I solved this problem buy change analyzer name from "folding" to "default". This work for me.

Working configuration:

fos_elastica:
    serializer: ~
    clients:
        default: { host: 127.0.0.1, port: 9200 }
    indexes:
        bpo:
            settings:
                index:
                    analysis:
                        analyzer:
                            default:
                                tokenizer: standard
                                filter: [standard, lowercase, asciifolding, polish_stem]
  • 1
    thanks,still usefull in 2020. I've been using this tip for Sylius and Bitbag ElasticSearchBundle, however the configuration didn't need to set the `index:` key (between `settings:` and `analysis:`) – Anybug Jan 29 '20 at 22:50