The company I work for purchased data cleansing and matching software to cleanse and match information every night. It takes about fifteen hours to run.
I have discovered the Fuzzy Group/Fuzzy Lookup component in SSIS, which is extremely fast in my experience by comparison. I have some questions:
What algorithms do these components use? I have read articles that suggest they use: Soundex, variations of soundex, QGrams and Levenstein Distance or a combination of the four. Is there any documentation, which explicitly specified which algorithm they use?