0

In our service we are trying to port a customization of the "nysiis" phonetic algorithm to Elasticsearch.

Our algorithm performs this name transformation:

given a list of "Surnames" and "Firstnames", such as "[Smith]" and "[John]", takes the "nysiis" phonetic encoder on the first surname, and concatenates it to the lowercase first initial of the first firstname.

Hence:

nysiis(Surnames[0]) + lower(Firstnames[0][0])

So the result with "John Smith" would be "SNATHj"

I know ES supports the nyyis token filter https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-phonetic-token-filter.html

but I wonder what is the best way to implement a custom analyzer that automatically perform the above transformation.


Note: I guess, if needed, we can already provide ES with a simple structure:

{"surname": "Smith", "initial_first_name": "j"}
Sam
  • 409
  • 4
  • 6
  • The better question is what do you want to do with the combined values afterwards? – Andrei Stefan Jul 13 '16 at 10:15
  • An external library, ML-based, has been trained exactly with this particular transformation, hence I need to be able to search for all the records matching this particular form of phonetic-block. So to answer your specific question I would then, given a surname and a firstname transform them at search time and use to match all the records falling into the same phonetic block. – Sam Jul 13 '16 at 14:25

0 Answers0