0

I'm hoping to implement a fuzzy match algorithm in TSQL (without MDS) that compares full names. The names are coming from separate manual inputs with no controls over what's entered. One of the systems also tends to cut off the end of names as it has a low character limit. As an extreme example of the sort of data I'm looking at:

System1Name                System2Name
-----------                -----------
Katherine G Livingstone    Kat George Livingst

I'm definitely not looking for a perfect match here, but hopefully someone can recommend an algorithm for something like this. I've looked into Jaro-Winkler distance and Soundex, but I'm not sure if I'm better off using a distance based algorithm or a phonetic algorithm with this kind of data. Thoughts?

user3457834
  • 314
  • 3
  • 12
  • Is there anything you can rely on (e.g. three parts on both sides, part order, first character for each part)? What about `KGL==KGL`? How many entries are searched? Is there any chance to heal the input? – Shnugo Sep 01 '18 at 08:45
  • 1
    And please tag the question with your RDBMS (vendor **and version**). `tsql` points to SQL-Server, but this is not sure. – Shnugo Sep 01 '18 at 08:52
  • Unfortunately the answer to all those questions is no, there's nothing that can be fully relied on as it's entirely manual input. `KGL == KGL` probably wouldn't work, as both inputs don't necessarily include a middle name, or one of the inputs could be two names, i.e. `K & J Livingstone`. I'm using SQL Server 2016, so I ended up using the build in SOUNDEX based DIFFERENCE() function on both the full name and only the last name. As my code is only looking to pull out major differences between the two inputs it seems to be working pretty well. – user3457834 Sep 03 '18 at 21:10

0 Answers0