0

i have two tables, ADDRESSES and an additional table CONTACTS. CONTACTS have a SUPERID which is the ID of the ADDRESS they belong to. I want to identify duplicates (same Name, Firstname and Birthday) in the ADDRESSES Table and merge the contacts of these duplicates onto the latest Adress (latest DATECREATE or highest ID of the Adress). Afterwards the other duplicates shall be deleted.

My approach for merging the contacts does not work though. Deleting duplicates works. This is my approach. Would be grateful for support what is wrong here. Thank you!

      UPDATE dbo.CONTACTS
    SET SUPERID = ADDRESSES.ID FROM dbo.ADDRESSES
inner join CONTACTS on ADDRESSES.ID = CONTACTS.SUPERID
        WHERE ADDRESSES.id in (
    SELECT id  FROM dbo.ADDRESSES 
    WHERE EXISTS(
        SELECT NULL FROM ADDRESSES AS tmpcomment
               WHERE dbo.ADDRESSES.FIRSTNAME0 = tmpcomment.FIRSTNAME0
               AND dbo.ADDRESSES.LASTNAME0 = tmpcomment.LASTNAME0
               and dbo.ADDRESSES.BIRTHDAY1 = tmpcomment.BIRTHDAY1
               HAVING dbo.ADDRESSES.id > MIN(tmpcomment.id)
                       ))

        DELETE FROM ADDRESSES
    WHERE id in (
    SELECT id FROM dbo.ADDRESSES
          WHERE EXISTS(
        SELECT NULL FROM ADDRESSES AS tmpcomment
               WHERE dbo.ADDRESSES.FIRSTNAME0 = tmpcomment.FIRSTNAME0
               AND dbo.ADDRESSES.LASTNAME0 = tmpcomment.LASTNAME0
               and dbo.ADDRESSES.BIRTHDAY1 = tmpcomment.BIRTHDAY1
               HAVING dbo.ADDRESSES.id > MIN(tmpcomment.id)
                       )
                         )

Here is a sample for understanding the issue.

ADDRESSES

|    ID      | DATECREATE  |   LASTNAME0  | FIRSTNAME0  |    BIRTHDAY1 |
|:-----------|------------:|:------------:|------------:|:------------:|
| 1          |  19.07.2011 |     Arthur   |   James     |  05.05.1980  |
| 2          |  23.08.2012 |     Arthur   |   James     |  05.05.1980  |
| 3          |  11.12.2015 |     Arthur   |   James     |  05.05.1980  |
| 4          |  22.10.2016 |     Arthur   |   James     |  05.05.1980  |
| 6          |  20.12.2014 |     Doyle    |   Peter     |  01.01.1950  |
| 7          |  09.01.2016 |     Doyle    |   Peter     |  01.01.1950  |
|:-----------|------------:|:------------:|------------:|:------------:|

CONTACTS
|    ID      | SUPERID  |
|    1       |    1     |
|    2       |    1     |
|    3       |    2     |
|    4       |    2     |
|    5       |    3     |
|    6       |    4     |
|    7       |    4     |
|    8       |    6     |
|    9       |    6     |
|    10      |    6     |
|    11      |    7     |

The result shall be like this

ADDRESSES
    |    ID      | DATECREATE  |   LASTNAME0  | FIRSTNAME0  |    BIRTHDAY1 |
    |:-----------|------------:|:------------:|------------:|:------------:|
    | 4          |  22.10.2016 |     Arthur   |   James     |  05.05.1980  |
    | 7          |  09.01.2016 |     Doyle    |   Peter     |  01.01.1950  |

    CONTACTS

    |    ID      | SUPERID  |
    |    1       |    4     |
    |    2       |    4     |
    |    3       |    4     |
    |    4       |    4     |
    |    5       |    4     |
    |    6       |    4     |
    |    7       |    4     |
    |    8       |    7     |
    |    9       |    7     |
    |    10      |    7     |
    |    11      |    7     |
HBecks
  • 11
  • 3
  • putting some sample data in will help. I basically understood your requirements as: parent table: Addresses, child table: Contacts. Task: find duplicate address based on 4 columns, and reassign all contacts from earlier duplicates to latest duplicate address, then clean up old addresses and repeated contact data? – Caius Jard Mar 06 '19 at 17:28
  • Thats is the task yes, but with changing the SUPERID of the contacts there is no repeated contact data, it's enough to clean up old addresses. I posted an example below. Hope that clarifies – HBecks Mar 07 '19 at 08:24
  • I presume by the presence of dbo that this is sqlserver? – Caius Jard Mar 07 '19 at 08:45
  • Yes it's SQL Server – HBecks Mar 07 '19 at 08:51

1 Answers1

0

My approach would use a temporary table:

/*


CREATE TABLE addresses
([ID] int, [DATECREATE] varchar(10), [LASTNAME0] varchar(6), [FIRSTNAME0] varchar(5), [BIRTHDAY1] datetime);

INSERT INTO addresses
([ID], [DATECREATE], [LASTNAME0], [FIRSTNAME0], [BIRTHDAY1])
VALUES
(1, '19.07.2011', 'Arthur', 'James', '1980-05-05 00:00:00'),
(2, '23.08.2012', 'Arthur', 'James', '1980-05-05 00:00:00'),
(3, '11.12.2015', 'Arthur', 'James', '1980-05-05 00:00:00'),
(4, '22.10.2016', 'Arthur', 'James', '1980-05-05 00:00:00'),
(6, '20.12.2014', 'Doyle', 'Peter', '1950-01-01 00:00:00'),
(7, '09.01.2016', 'Doyle', 'Peter', '1950-01-01 00:00:00');


CREATE TABLE contacts
([ID] int, [SUPERID] int);

INSERT INTO contacts
([ID], [SUPERID])
VALUES
(1, 1),
(2, 1),
(3, 2),
(4, 2),
(5, 3),
(6, 4),
(7, 4),
(8, 6),
(9, 6),
(10, 6),
(11, 7);

*/


DROP TABLE IF EXISTS #t; --sqls2016+ only, google for an older method if yours is sub 2016
SELECT id as oldid, MAX(id) OVER(PARTITION BY lastname0, firstname0, birthday1) as newid INTO #t
FROM 
  addresses;

/*now #t contains data like 
1, 4
2, 4
3, 4
4, 4
6, 7
7, 7*/

--remove the ones we don't need to change
DELETE FROM #t WHERE oldid = newid;

BEGIN TRANSACTION;
SELECT * FROM addresses;
SELECT * FROM contacts;

--now #t is the list of contact changes we need to make, so make those changes
UPDATE contacts
SET contacts.superid = #t.newid
FROM
  contacts INNER JOIN #t ON contacts.superid = #t.oldid;

--now scrub the old addresses with no contact records. This catches all such records, not just those in #t
DELETE FROM addresses WHERE id NOT IN (SELECT DISTINCT superid FROM contacts);

--alternative to just clean up the records we affected in this operation
DELETE FROM addresses WHERE id IN (SELECT oldid FROM #t);

SELECT * FROM addresses;
SELECT * FROM contacts;
ROLLBACK TRANSACTION;

Please note, i have tested this and it produces the results you want but I advocate caution copying an update/delete query off the internet and running. I've inserted a transaction that selects the data before and after and rolls back the transaction so nothing gets wrecked. Run it on a test db first though!

Caius Jard
  • 72,509
  • 5
  • 49
  • 80