0

I am attempting to write a query that finds when a Name changes but a Customer ID does not. To do this I used

Group by CustID, Name

Then counted Customer IDs, where any result greater than 1 is important. Here is the query I have so far:

Select
    CustID,
    NumDiffNames
from
    (select
        CustID,
        count(CustID) NumDiffNames
    from
        (Select
            CustID,
            Name
        from
            (select
                CustID,
                concat(rtrim(ltrim(FIRSTNAME)),rtrim(ltrim(LASTNAME))) Name
            FROM
                [Branch]
            union
            select
                coCustID as CustID,
                concat(rtrim(ltrim(COFNAME)),rtrim(ltrim(COLNAME))) Name
            FROM
                [Branch]) a
        group by CustID, Name) b
    group by CustID) c
where NumDiffNames>1

However, I am getting many false positives. I removed a large number of them by trimming the whitespace around each name. Another set of false positives is formed by data entry and misspellings. For example in one case the customers name is spelled Vanessa, and for the same Customer ID, the name is spelled Venessa. Or a name is spelled correctly in one case, but two letters are transposed in another John & Jhon for example.

Is it possible to have groups with more variance, like one or two character differences?

Golden Ratio
  • 349
  • 2
  • 14
  • Can you post the tried `query` as well. – Vishwanath Dalvi Aug 22 '17 at 13:32
  • 1
    For names, you can try `soundex()`. However, relational databases are not the best tool for fuzzy string matching. – Gordon Linoff Aug 22 '17 at 13:32
  • There is software available to help out. It even handles Joe vs Joseph and Katherine, Kate, Cathy... – Joe C Aug 22 '17 at 13:33
  • It looks you are searching for [Levenshtein distance](https://stackoverflow.com/a/9195141/842935) – dani herrera Aug 22 '17 at 13:44
  • mr_eclair, I have added the tried query. GordonLinoff, That does remove some results, thank you. However, There are still false positives 'Michael' vs 'Mcihael' are phonetically different. JoeC & danihp I am on the user end of the database and have limited access. Any query with 'create' 'update' or 'insert' are immediately rejected, and I am unable to run additional software to correct the database. – Golden Ratio Aug 22 '17 at 13:55

1 Answers1

0

Could you use something like this?

WITH CTE AS (
    SELECT DISTINCT
         CustID
        ,[name]
        ,ROW_NUMBER() OVER (PARTITION BY CustID, [Name] ORDER BY CustID) AS RowNumber
    FROM Branch )

SELECT * 
FROM CTE 
WHERE RowNumber > 1
user7593937
  • 545
  • 1
  • 5
  • 16