I'm trying to do a comparison match in snowflake
between the first_name
, middle_name
, and last_name
from a target table (700K rows) and from source table (1.5M rows). The values in the first_name
, middle_name
, and last_name
can be partially, or fully swapped in the source
table, that is the value in the first_name
can be swapped with the value in the last_name
or any other way. For a positive probable identification at least two matches need to be correct. I then do further processing to guarantee the accuracy of the match.
Example:
case 1: first name (target) = first name (source) | middle name (target) = middle name. (MATCH)
case 2: first name (target) = last name (source) | middle name (target) = middle name (source) | last name (target) = first name (source) -> first name swapped with last lame (MATCH)
Example (with table):
Target Table
id | first_na_t | middle_name_t | last_name_t
1 | ffa | mma | lla
2 | ffa | mmb | lla
3 | ffb | ffa | llb
4 | ffc | mmd | lla
...
Source Table
id | first_name_s | middle_name_s | last_name_s
1 | lla | mmb | ffa
5 | ffa | mmb | lla
3 | ffb | ffa | llb
4 | mmd | ffc | lla
...
From the example tables the attempt that I used was to compare the first row in the target
table with all the other rows in the source
table using a CROSS JOIN
with a CASE
afterwards but this is proving to be inefficient and slow.
Is there any way to JOIN
this cleverly where you would compare the rows in the target
table against the rows in the source
table and come out with at least 2 match even though any of the name
columns could be swapped. Also I can not use the id
column.