I have two data sets that I need to link together, in the sense that I have to find the records that appear in both data sets within a certain margin of error (for example, a person's first name is misspelled in one of the sets, a person moved, married and thereby got a different surname, etc.)
Since the data is sensitive, it should be anonymized. However, I cannot use standard anonymization techniques (hashing for example), since that wouldn't preserve some properties vital to linking records.
Therefore, I am looking for a way to anonymize my textual data in a way that it preserves for example Levenshtein distance. Do such techniques exist?