Custom phonetic analysis in Elasticsearch

Question

In our service we are trying to port a customization of the "nysiis" phonetic algorithm to Elasticsearch.

Our algorithm performs this name transformation:

given a list of "Surnames" and "Firstnames", such as "[Smith]" and "[John]", takes the "nysiis" phonetic encoder on the first surname, and concatenates it to the lowercase first initial of the first firstname.

Hence:

nysiis(Surnames[0]) + lower(Firstnames[0][0])

So the result with "John Smith" would be "SNATHj"

I know ES supports the nyyis token filter https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-phonetic-token-filter.html

but I wonder what is the best way to implement a custom analyzer that automatically perform the above transformation.

Note: I guess, if needed, we can already provide ES with a simple structure:

{"surname": "Smith", "initial_first_name": "j"}

The better question is what do you want to do with the combined values afterwards? — Andrei Stefan, Jul 13 '16 at 10:15
An external library, ML-based, has been trained exactly with this particular transformation, hence I need to be able to search for all the records matching this particular form of phonetic-block. So to answer your specific question I would then, given a surname and a firstname transform them at search time and use to match all the records falling into the same phonetic block. — Sam, Jul 13 '16 at 14:25

Custom phonetic analysis in Elasticsearch

0 Answers0