1

Trying to find the percentage of characters that are matching between 2 columns.

Table A

ID  Name
1   RICK
2   PATRICK
3   MARK.HI

Table B

ID  Name
1   RICK
2   PATRCIK
3   MARK-HI

when i join both the tables (A and B), If column 'Name' matches ('RICK' in Table A = 'RICK' in Table B), then a new column called percentage should show as 100%.

If the column 'Name' does not match when I join both the tables ('PATRICK' in Table A <> 'PATRCIK' in Table B) it should show approximately by what percentage the column is not matching.

I am trying to figure out the messy data.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Sam etl
  • 11
  • 2
  • 1
    In Oracle we use the built in JARO_WINKLER_SIMILARITY function for exactly this purpose. It compares strings and returns a % match. It appears that this is not a built in function for SQL Server, but can it be created. This was discussed on a previous SO thread - https://stackoverflow.com/questions/33882620/jaro-winkler-in-sql-server. Code is provided in the link. – alexherm May 30 '19 at 21:45
  • Do you have Master Data Services? http://blog.hoegaerden.be/2011/02/05/finding-similar-strings-with-fuzzy-logic-functions-built-into-mds/ – BenderBoy May 30 '19 at 21:47

1 Answers1

0

The question isn't overly clear, however I've made the assumption that a match is where the names are in both tables and there can be names in either table that are not in the other:

DDL

CREATE TABLE MyTab1
(
    MyVAL NVARCHAR(20)
);

INSERT INTO MyTab1 VALUES ('Rick'),('Patrick'),('John'),('Jhon');

CREATE TABLE MyTab2
(
    MyVAL NVARCHAR(20)
);

INSERT INTO MyTab2 VALUES ('Rick'),('Patrikc'),('Fred');

Query

DECLARE @TotalRecords DECIMAL= (
                                   SELECT COUNT(*)
                                   FROM (
                                           SELECT b.MyVal AS Val1,
                                                  a.MyVAL AS Val2
                                           FROM   MyTab1 a
                                                  FULL JOIN Mytab2 b
                                                      ON a.MyVal = b.MyVal
                                         ) a
                                 );

DECLARE @Matches DECIMAL  = (
                                 SELECT COUNT(*)
                                 FROM (
                                           SELECT b.MyVal AS Val1,
                                                  a.MyVAL AS Val2
                                           FROM   MyTab1 a
                                                  FULL JOIN Mytab2 b
                                                      ON a.MyVal = b.MyVal
                                      ) a
                                 WHERE Val1 IS NOT NULL AND Val2 IS NOT NULL
                            );

SELECT (@Matches / @TotalRecords) * 100

In this example, the value returned is 16.6% as there are 6 unique values across the two tables of which 1 is common across the two - (1 / 6) * 100 = 16.6

SE1986
  • 2,534
  • 1
  • 10
  • 29