1

I have a report that displays a list of duplicate accounts based on our business rules. This works when one new account is matched with other existing accounts. Where I'm having trouble is when multiple new accounts match the same existing duplicate. Here's an example of how it looks now, grouped by NewId:

NewID   MatchedID   FirstName   LastName    AddDate      Address        PhoneNumber
10      10          Holly       Johnson     4/18/2013    123 1St Rd.    123 456 7890
10      2           Hollie      Johnson     1/1/1990     123 1St Rd.    123 456 7890

11      11          Holley      Johnson     4/17/2013    123 1St Rd.    123-456-7890
11      2           Hollie      Johnson     1/1/1990     123 First Rd.  123 456 7890

50      50          William     Johnson     4/17/2013    999 2nd St.    222 222 2222
50      3           Bill        Jonson      1/2/1990     999 Second St. 222-222-2222

Accounts that have matches are themselves included for comparison.

So, is there a way to group these similar accounts together without duplicates? It should look like this:

GroupID  AcctID   FirstName   LastName    AddDate      Address        PhoneNumber
1        2        Hollie      Johnson     1/1/1990     123 First Rd.  123 456 7890
1        10       Holly       Johnson     4/18/2013    123 1St Rd.    123 456 7890
1        11       Holley      Johnson     4/17/2013    123 1St Rd.    123-456-7890
2        50       William     Johnson     4/17/2013    999 2nd St.    222 222 2222
2        3        Bill        Jonson      1/2/1990     999 Second St. 222-222-2222

I don't care if the grouping is done in SQL or in SSRS. It would need to reference the two ID columns, because the name, address, and phone number may be different. I also need a new GroupID assigned so that they can be grouped in the report.

StevenWhite
  • 5,907
  • 3
  • 21
  • 46
  • I'm not really sure what you're asking here. Can you supply the following: A sample of the underlying data, the results you're currently getting with that sample, and the results you'd like to get with that sample. – Ian Preston Apr 19 '13 at 13:28
  • Expanded the examples as suggested. – StevenWhite Apr 19 '13 at 15:47
  • How do you choose between `New -> 10 Matched -> 2` and `New -> 11 Matched -> 2` above? Are you just choosing the one with the lowest `NewID`? Also, are you only looking to get rid of duplicate `MatchedId` values or should all the extra address fields be considered too> – Ian Preston Apr 19 '13 at 16:09
  • Choosing between those two would be arbitrary, they should not be considered distinct. The address etc. fields are all tied to the MatchedId. In other words, I want to group distinct MatchedId's that were related. – StevenWhite Apr 19 '13 at 16:26

1 Answers1

1

You can use ranking functions to eliminate rows:

with NoDuplicates as
(
  select *
    , rownum = row_number() over (partition by MatchedID order by NewID)
  from Accounts
)
select   NewID
  , MatchedID
  , Name
  , AddDate
  , Address
  , phoneNumber
from NoDuplicates where rownum = 1

SQL Fiddle with demo.

Although there's no reason you can't just use GROUP BY assuming the address information is always duplicated too:

select NewID = min(NewID)
  , MatchedID
  , Name
  , AddDate
  , Address
  , phoneNumber
from Accounts
group by MatchedID
  , Name
  , AddDate
  , Address
  , phoneNumber

SQL Fiddle with demo.

Both of these are returning your expected result.

Edit after comment:

You can group related rows with a statement like this:

with NoDuplicates as
(
  select *
    , rownum = row_number() over (partition by MatchedID order by NewID)
  from Accounts
  where NewID <> MatchedID
)
select groupID = MatchedID
  , Acct = MatchedID
  , FirstName
  , AddDate
  , Address
  , phoneNumber
from NoDuplicates where rownum = 1
union all
select groupID = coalesce(am.MatchedID, a.NewID)
  , Acct = a.MatchedID
  , a.FirstName
  , a.AddDate
  , a.Address
  , a.phoneNumber
from Accounts a
  -- join to the corresponding matched account
  left join Accounts am on a.MatchedID = am.NewID and am.NewID <> am.MatchedID
where a.NewID = a.MatchedID
order by groupID, Acct

SQL Fiddle with demo.

However, this essentially just groups by MatchedID. If you want numbered groups starting from 1, you can add a DENSE_RANK clause to the statement:

with NoDuplicates as
(
  select *
    , rownum = row_number() over (partition by MatchedID order by NewID)
  from Accounts
  where NewID <> MatchedID
)
, GroupedAcct as
(
  select GroupID = MatchedID
    , Acct = MatchedID
    , FirstName
    , AddDate
    , Address
    , phoneNumber
  from NoDuplicates where rownum = 1
  union all
  select GroupID = coalesce(am.MatchedID, a.NewID)
    , Acct = a.MatchedID
    , a.FirstName
    , a.AddDate
    , a.Address
    , a.phoneNumber
  from Accounts a
    -- join to the corresponding matched account
    left join Accounts am on a.MatchedID = am.NewID and am.NewID <> am.MatchedID
  where a.NewID = a.MatchedID
)
select GroupID = Dense_Rank() over (order by GroupID)
  , Acct
  , FirstName
  , AddDate
  , Address
  , phoneNumber
from GroupedAcct
order by groupID, Acct

SQL Fiddle with demo.

Ian Preston
  • 38,816
  • 8
  • 95
  • 92
  • The first example seems to be working perfectly. The Fiddle example was nice too. Thanks! – StevenWhite Apr 19 '13 at 17:06
  • The address, phone number, etc. do vary, that's why I couldn't just group on those. – StevenWhite Apr 19 '13 at 17:23
  • Is there a way to assign each of these new groups an ID? This method filters out the duplicates, but I still need a way to associate them with each other in the report, otherwise it's not really useful. – StevenWhite Apr 19 '13 at 22:26
  • @StevenW, doesn't *MatchedID* fulfil this requirement? If not, can you please amend the question to show you're required results. Sorry to ask for more information again. – Ian Preston Apr 20 '13 at 22:36
  • @StevenW, thanks for the update, I think I've got it now... I've updated with a couple of similar statements - one that just groups by *MatchedID* and one that creates an artificial ordinal *GroupID*. Hopefully this all works with your real data. – Ian Preston Apr 22 '13 at 23:18