I have two data sets from two different systems being merged together within SQL, however, there is a slight difference within the naming conventions on the two systems. The change in convention is not consistent across the larger data sample but normally requires one modification to match.
System 1 data | System 2 data |
---|---|
AA0330 | AA_330 |
AA0340 | AA_340 |
AA0331 | AA_331 |
AA0341 | AA-341 |
I have been using the below Levenshtein distance SQL function too fuzzy match and get the below result, however, end up with duplicate joins. How could I modify my code to mitigate this?
SELECT [System1].[mvmt1],
[System2].[mvmt2]
FROM [System1]
left join [System2]
ON dbo.ufn_levenshtein([System1].[mvmt1], [System2].[mvmt2]) < 2;
http://www.artfulsoftware.com/infotree/qrytip.php?id=552&m=0
Current output:
System 1 data | System 2 data |
---|---|
AA0330 | AA_330 |
AA0330 | AA_340 |
AA0340 | AA_340 |
AA0331 | AA_331 |
AA0341 | AA-341 |
How can I make sure I only get one outcome from the join?