elasticsearch: analyzer to match all TEXT regardless of non-alpha symbols

Question

I'm struggling to find the right analyzer combination for a text field. I need all words to match and be present (although would be nice to ignore stopwords) regardless of punctuation.

For example: "a pretty dog named bart" and "a pretty dog, named bart" should both return the doc, but "a pretty dog" should not.

I thought about saving both (or multiple) phrases in the field and using field.keyword, but there could be a few various permutations of symbols and I don't think this is the smartest way.

I know you can't add an analyzer to "keyword" data type-- is there another setup that would make more sense?

currently i have it set up with

'custom_char_filter' => [
                'type' => "mapping",
                'mappings' => [
                    ". => ",
                    ", => "
                ]
            ]

'custom_analyzer' => [
                'type' => 'standard',
                'stopwords' => '_english_',
                'char_filter' => [
                    'custom_char_filter'
                ],
            ],

score 0 · Answer 1 · answered Apr 13 '22 at 18:39

I was able to accomplish it with a normalizer on a keyword type

The normalizer:

'normalizer' => [
            'custom_normalizer' => [
                'type' => 'custom',
                'char_filter' => 'custom_char_filter',
                'filter' => ['lowercase'],
            ]
        ]

The char filter:

           'char_filter' => [
            'custom_char_filter' => [
                'type' => "mapping",
                'mappings' => [
                    ". => ",
                    ", => ", //remove , and .
                    "- =>\\u0020", //replace dash with empty space
                ]
            ]
        ],

My field's mapping:

            'my_field' => [
            'type' => 'keyword',
            'normalizer' => 'custom_normalizer'
        ],

Now "a pretty dog named bart" and "a pretty dog, named bart" both work and "a pretty dog" does not.

elasticsearch: analyzer to match all TEXT regardless of non-alpha symbols

1 Answers1