I came across a forum post that describes a method of creating a Python UDF in Redshift: https://community.periscopedata.com/r/y715m2.
More info about Python UDFs in Redshift: https://docs.aws.amazon.com/redshift/latest/dg/udf-python-language-support.html
I checked a number of outputs by the function (like select public.levenshtein('walk', 'cake')
)- and it works quite well.
I am hoping to use this concept for fuzzy matching in joins between two tables on t1.first_name+last_name = t2.first_name+last_name
.
Is anyone familiar with a "magical range" (or can suggest something from experience) in which a record should fall between to be deemed a likely match? ie. what should the min and max levenshtein (s,t) be to be considered a likely match.