Merging two parent > child table sets

Question

I need to get the data in two parent > child table sets merged/combined into a third parent > child table.

The tables look like this:

Table structure

The only difference in the three sets of tables is that TableC has a TableType column to help discern the difference between a TableA record and a TableB record.

My first thought was to use a cursor.. Here's code to create the table structure, insert some records, and then merge the data together. It works very well, sooooo....

--Create the tables

CREATE TABLE TableA
(
    ID int not null identity primary key,
    Name VARCHAR(30)
);

CREATE TABLE TableAChild
(
    ID int not null identity primary key,
    Parent int not null,
    Name VARCHAR(30),
    CONSTRAINT FK_A FOREIGN KEY (Parent) REFERENCES TableA(ID)
);

CREATE TABLE TableB
(
    ID int not null identity primary key,
    Name VARCHAR(30)
);

CREATE TABLE TableBChild
(
    ID int not null identity primary key,
    Parent int not null,
    Name VARCHAR(30),
    CONSTRAINT FK_B FOREIGN KEY (Parent) REFERENCES TableB(ID)
);

CREATE TABLE TableC
(
    ID int not null identity primary key,
    TableType VARCHAR(1),
    Name VARCHAR(30)
);

CREATE TABLE TableCChild
(
    ID int not null identity primary key,
    Parent int not null,
    Name VARCHAR(30),
    CONSTRAINT FK_C FOREIGN KEY (Parent) REFERENCES TableC(ID)
);

-- Insert some test records.. 

INSERT INTO TableA (Name) Values ('A1')
INSERT INTO TableAChild (Name, Parent) VALUES ('A1Child', SCOPE_IDENTITY())
INSERT INTO TableB (Name) Values ('B1')
INSERT INTO TableBChild (Name, Parent) VALUES ('B1Child', SCOPE_IDENTITY())

-- Needed throughout.. 
DECLARE @ID INT

-- Merge TableA and TableAChild into TableC and TableCChild
DECLARE TableACursor CURSOR
    -- Get the primary key from TableA
    FOR SELECT ID FROM TableA
OPEN TableACursor
    FETCH NEXT FROM TableACursor INTO @ID

    WHILE @@FETCH_STATUS = 0
    BEGIN
        -- INSERT INTO SELECT the parent record into TableC, being sure to specify a TableType
        INSERT INTO TableC (Name, TableType) SELECT Name, 'A' FROM TableA WHERE ID = @ID

        -- INSERT INTO SELECT the child record into TableCChild using the parent ID of the last row inserted (SCOPE_IDENTITY())
        -- and the current record from the cursor (@ID).
        INSERT INTO TableCChild(Name, Parent) SELECT Name, SCOPE_IDENTITY() FROM TableAChild WHERE Parent = @ID

        FETCH NEXT FROM TableACursor INTO @ID
    END;

CLOSE TableACursor
DEALLOCATE TableACursor

-- Repeat for TableB
DECLARE TableBCursor CURSOR
    FOR SELECT ID FROM TableB
OPEN TableBCursor
    FETCH NEXT FROM TableBCursor INTO @ID

    WHILE @@FETCH_STATUS = 0
    BEGIN
        INSERT INTO TableC (Name, TableType) SELECT Name, 'B' FROM TableB WHERE ID = @ID
        INSERT INTO TableCChild(Name, Parent) SELECT Name, SCOPE_IDENTITY() FROM TableBChild WHERE Parent = @ID
        FETCH NEXT FROM TableBCursor INTO @ID
    END;

CLOSE TableBCursor
DEALLOCATE TableBCursor

Now, my question(s):

I've always been told that cursors are bad. But I couldn't find another way of doing it. I'm wondering if there's some way to do that with a CTE?
If the cursor is appropriate in this situation, how did I do? Is there a better way of doing what I did? It doesn't look very DRY to me, but I'm no SQL expert.

Lastly, if you want to re-run the query above, here's a small script to delete the tables that were created.

DROP TABLE TableAChild
DROP TABLE TableBChild
DROP TABLE TableCChild

DROP TABLE TableA
DROP TABLE TableB
DROP TABLE TableC

The correct result should look like:

Desired result

If the name fields in `tablea` and `tableb` are unique per table, then you could rewrite this without a cursor and just use a `join`. However, if they aren't unique, then the cursor is the only option I see to get the `id` field from the parent. — sgeddes, May 13 '15 at 13:37
@sgeddes Are you referring to the *values* in the Name column? If so, they are not unique. — Alex Dresko, May 13 '15 at 13:42
Why not just add another column to TableC for LegacyID. This will give you a value to use a join. There is no need to do this kind of RBAR. — Sean Lange, May 13 '15 at 13:47
@SeanLange Hrmmm, you've got me thinking, but I'm not sure how it would work. There are relationships between the TableX.ID column and TableXChild.Parent column in my database (apologies for not including that in the sample, as I didn't think it was necessary). And I definitely don't want to leave behind a junk LegacyID column. Can you explain, possibly in an answer? — Alex Dresko, May 13 '15 at 13:54
You could add that LegacyID column as a temporary thing. It would allow to use it to join the child rows to the new parent. Once you are done merging the data you could drop that column because it wouldn't have any meaning anymore. — Sean Lange, May 13 '15 at 14:02
@SeanLange Does your suggest take into account that there are new child records that have to be created as well? I feel like I'm close to understanding, but am missing the connection to the new child records. — Alex Dresko, May 13 '15 at 14:06
FYI, I updated the table creation scripts to be more specific about primary keys and relationships. — Alex Dresko, May 13 '15 at 15:26

score 4 · Answer 1 · edited May 23 '17 at 12:05

You can use merge as described by Adam Machanic in Dr. OUTPUT or: How I Learned to Stop Worrying and Love the MERGE and in this question to get a mapping between the new identity value and the old primary key value in a table variable and the use that when you insert to your child tables.

declare @T table(ID int, IDC int);

merge dbo.TableC as C
using dbo.TableA as A
on 0 = 1
when not matched by target then
  insert (TableType, Name) values('A', A.Name)
output A.ID, inserted.ID into @T(ID, IDC);

insert into dbo.TableCChild(Parent, Name)
select T.IDC, AC.Name
from dbo.TableAChild as AC
  inner join @T as T
    on AC.Parent = T.ID;

delete from @T;

merge dbo.TableC as C
using dbo.TableB as B
on 0 = 1
when not matched by target then
  insert (TableType, Name) values('B', B.Name)
output B.ID, inserted.ID into @T(ID, IDC);

insert into dbo.TableCChild(Parent, Name)
select T.IDC, BC.Name
from dbo.TableBChild as BC
  inner join @T as T
    on BC.Parent = T.ID;

SQL Fiddle

This is definitely pretty magical. And, surprisingly, I understand it without even having looked at the articles you referenced. The 0=1 feels a bit hackish though. Seems like you're using `merge` just so you can take advantage of `output` and `inserted`. In .NET, we tend to frown on that kind of abuse. But, from what I can tell, it seems to be an acceptable usage of of `merge` online. Let's wait and see if there are any other submissions.. — Alex Dresko, May 15 '15 at 19:39
Really clever use of merge since output of normal insert cannot get A.id into table variable which is why in my solution i had to jump though loops to match both ids — ughai, May 15 '15 at 19:52

score 1 · Accepted Answer · edited May 15 '15 at 20:32

1

Here is one way to do this without a cursor or other RBAR type stuff.

ALTER TABLE TableC ADD LegacyID INT
GO

INSERT INTO TableC (TableType, Name, LegacyID)
SELECT 'A', Name, ID
FROM TableA

INSERT TableCChild
SELECT C.ID, AC.Name
FROM TableAChild AC
JOIN TableA A ON A.Id = AC.ID
JOIN TableC C ON C.LegacyID = A.ID AND C.TableType = 'A'

INSERT INTO TableC (TableType, Name, LegacyID)
SELECT 'B', Name, ID
FROM TableB

INSERT TableCChild
SELECT C.ID, AC.Name
FROM TableBChild AC
JOIN TableB A ON A.Id = AC.ID
JOIN TableC C ON C.LegacyID = A.ID AND C.TableType = 'B'

ALTER TABLE TableC DROP COLUMN LegacyID
GO

edited May 15 '15 at 20:32

Alex Dresko

5,179
3
37
57

answered May 13 '15 at 14:07

Sean Lange

33,028
3
25
40

Unfortunately, this is a very incomplete example. Plus, TableX.ID is an identity column, so you can't insert records into it.. and you can't turn on identity insert because you'd have conflicts when merging the records from TableB and TableBChild. I feel like you've got something going though, so I'd love to see it finished. – Alex Dresko May 13 '15 at 14:13
You're not even using LegacyId. :) – Alex Dresko May 13 '15 at 14:15
I think what he means is `from TableAChild ac join TableC A on A.LegacyID = ac.ID` – ughai May 13 '15 at 14:16
Oops. I totally screwed that up. Please see the update that is now using TableC to retrieve the new identity value. – Sean Lange May 13 '15 at 14:22
Doesn't run without errors. Can you use the script I included in the OP to generate tables and data to confirm that your answer works properly? – Alex Dresko May 13 '15 at 14:57
I did and it runs just fine. :) What errors are you seeing? – Sean Lange May 13 '15 at 14:59
I think it was just that there wasn't a `GO` after the `ALTER TABLE`. Or maybe it needed more specific column information in the INSERTS. I cleaned your answer up a bit and included support for TableB. Problem is the result now.. I'll edit my OP so you can see the result I'm getting. – Alex Dresko May 13 '15 at 15:20
I edited @SeanLange's answer to produce the correct result. I hate that I had to do so much of the work on my own for this answer to be valid, but he did at least point me in the right direction. So far, it is my favorite answer, as it is the simplest. – Alex Dresko May 15 '15 at 20:34
Glad it worked for you. I sort of assumed you would have to do a lot of the work since I would hope that the table and column names you provided were just examples. – Sean Lange May 15 '15 at 20:41
They were examples, but your last response definitely did not produce the desired output. – Alex Dresko May 15 '15 at 20:44

ughai · Answer 3 · 2015-05-13T14:08:09.797

You can use a map table to link the old and new ids together based on some key.

In my example, I am using the order of insertion into TableC.

Create a map table with an identity column.
Add data in TableC table based on order of ID of TableA and get the inserted ids in the map
Use the same order of TableA.id to get a ROWNUMBER() and match it with the identity column of the map table and update the old_id in map to match TableA.id with TableC.id .
Use the map to insert into the TableCChild table
Truncate the map and rinse and repeat for other tables.

Sample Query

CREATE TABLE  #map(id int identity,new_id int,old_id int);
INSERT INTO TableC
(
    TableType,
    Name
)output inserted.id into #map(new_id)
SELECT 'A',Name
FROM TableA
ORDER BY ID


update m
set m.old_id = ta.id
FROM #map m
inner join 
(
select row_number()OVER(order by id asc) rn,id
from tableA
)ta on ta.rn = m.id

INSERT INTO TableCChild (Name, Parent) 
SELECT Name,M.new_ID
FROM #Map M
INNER JOIN TableAChild TA ON M.old_id = TA.Parent

TRUNCATE TABLE #map

INSERT INTO TableC
(
    TableType,
    Name
)output inserted.id into #map(new_id)
SELECT 'B',Name
FROM TableB
ORDER BY ID

update m
set m.old_id = tb.id
FROM #map m
inner join 
(
select row_number()OVER(order by id asc) rn,id
from tableB
)tb on tb.rn = m.id

INSERT INTO TableCChild (Name, Parent) 
SELECT Name,M.new_ID
FROM #Map M
INNER JOIN TableBChild TB ON M.old_id = TB.Parent

DROP TABLE #Map

if it is possible to add an additional column to the table `TableC`, you can use the table itself as a map like Sean Lange suggested and then remove the column afterwards. — ughai, May 13 '15 at 14:06
This answer worked to produce the desired result. It looks more complicated than the cursor approach. I'll have to study it some though. And let's see what other think by way of voting. — Alex Dresko, May 13 '15 at 14:10

score 0 · Answer 4 · answered May 19 '15 at 16:02

I just wrote the following SQL to do it if the Name is unique in TableA and unique in TableB

INSERT INTO TableCChild
  (
    Parent,
    NAME
  )
SELECT tc.ID,
       ta.Name
FROM   TableAChild  AS ta
       JOIN TableA a
            ON  a.ID = ta.Parent
       JOIN TableC  AS tc
            ON  tc.Name = a.Name
                AND tc.TableType = 'A' 
UNION
SELECT tc.ID,
       tb.Name
FROM   TableBChild  AS tb
       JOIN TableB b
            ON  b.ID = tb.Parent
       JOIN TableC  AS tc
            ON  tc.Name = b.Name
                AND tc.TableType = 'B'

If Name is not unique and only the ID is the Unique Identifier then I would add the LegacyId as suggested and the code would then be as follows

/* Change Table C to Have LegacyId as well and this is used to find the New Key for Inserts
CREATE TABLE TableC
(
    ID            INT NOT NULL IDENTITY PRIMARY KEY,
    TableType     VARCHAR(1),
    LegacyId     INT,
    NAME          VARCHAR(30)
);
*/

INSERT INTO TableC (Name, TableType, LegacyId) 
SELECT DISTINCT NAME,
       'A', 
       Id
FROM   TableA
UNION
SELECT DISTINCT NAME,
       'B',
       Id
FROM   TableB

    INSERT INTO TableCChild
      (
        Parent,
        NAME
      )
    SELECT tc.ID,
           ta.Name
    FROM   TableAChild  AS ta
           JOIN TableA a
                ON  a.ID = ta.Parent
           JOIN TableC  AS tc
                ON  tc.LegacyId = a.Id
                    AND tc.TableType = 'A' 
    UNION
    SELECT tc.ID,
           tb.Name
    FROM   TableBChild  AS tb
           JOIN TableB b
                ON  b.ID = tb.Parent
           JOIN TableC  AS tc
                ON  tc.LegacyId = b.Id
                    AND tc.TableType = 'B'

Isn't this the same as Sean's answer, except you're UNION'ing everything together? If so, what's the benefit of the UNION here? — Alex Dresko, May 19 '15 at 18:24
Almost the same as his yes, there is no real advantage to the union actually, but in the first query if your names are unique in a and separately unique in b you don't need the legacy Id, meaning a simpler solution. I tested mine and made sure it ran without errors too :) — TonyM, May 19 '15 at 20:15
Also I'm more a proponent of keeping the legacyid if it was necessary, or leaving it off all together if not. — TonyM, May 19 '15 at 20:25

score 0 · Answer 5 · answered May 21 '15 at 07:05

We can reach this by turning the Identity column off till we finish the insertion like the following example.

--Create the tables

CREATE TABLE TableA
(
    ID int not null identity primary key,
    Name VARCHAR(30)
);

CREATE TABLE TableAChild
(
    ID int not null identity primary key,
    Parent int not null,
    Name VARCHAR(30),
    CONSTRAINT FK_A FOREIGN KEY (Parent) REFERENCES TableA(ID)
);

CREATE TABLE TableB
(
    ID int not null identity primary key,
    Name VARCHAR(30)
);

CREATE TABLE TableBChild
(
    ID int not null identity primary key,
    Parent int not null,
    Name VARCHAR(30),
    CONSTRAINT FK_B FOREIGN KEY (Parent) REFERENCES TableB(ID)
);

CREATE TABLE TableC
(
    ID int not null identity primary key,
    TableType VARCHAR(1),
    Name VARCHAR(30)
);

CREATE TABLE TableCChild
(
    ID int not null identity primary key,
    Parent int not null,
    Name VARCHAR(30),
    CONSTRAINT FK_C FOREIGN KEY (Parent) REFERENCES TableC(ID)
);

-- Insert some test records.. 

INSERT INTO TableA (Name) Values ('A1')
INSERT INTO TableAChild (Name, Parent) VALUES ('A1Child', SCOPE_IDENTITY())
INSERT INTO TableB (Name) Values ('B1')
INSERT INTO TableBChild (Name, Parent) VALUES ('B1Child', SCOPE_IDENTITY())

SET IDENTITY_INSERT TableC ON
INSERT INTO TableC(ID, TableType, Name)
SELECT ID, 'A', Name FROM TableA

INSERT INTO TableCChild(Parent, Name)
SELECT Parent, Name FROM TableAChild

DECLARE @MAXID INT
SELECT @MAXID = MAX(ID) FROM TableC
PRINT @MAXID

SET IDENTITY_INSERT TableC ON
INSERT INTO TableC(ID, TableType, Name)
SELECT ID + @MAXID, 'B', Name FROM TableB
SET IDENTITY_INSERT TableC OFF

INSERT INTO TableCChild(Parent, Name)
SELECT Parent + @MAXID, Name FROM TableBChild

SET IDENTITY_INSERT TableC OFF

SELECT * FROM TableC
SELECT * FROM TableCChild

DROP TABLE TableAChild
DROP TABLE TableBChild
DROP TABLE TableCChild

DROP TABLE TableA
DROP TABLE TableB
DROP TABLE TableC

score 0 · Answer 6 · answered May 21 '15 at 08:18

If you need to insert records in third table TableC and TableCChild for later use then it's fine to insert data in these tables but if you only need this table data to use it in your stored procedure for the time being then you can also just work with first two tables to get the desired result.

select * from (
select a.ID,'A' as TableType,a.Name from TableA a inner join TableAChild b on a.ID=b.ID
union
select a.ID,'B' as TableType,a.Name  from TableB a inner join TableBChild b on a.ID=b.ID) TableC

Similarly get TableCChild

select * from 
(
select b.ID,b.Parent,b.Name  from TableA a inner join TableAChild b on a.ID=b.ID
union
select b.ID,b.Parent,b.Name   from TableB a inner join TableBChild b on a.ID=b.ID) TableCChild

And if you have to insert in TableC and TableCChild then you have to recreate TableC with primary key on ID and TableType, and turn off the identity for ID column.

Merging two parent > child table sets

6 Answers6