Join / Merge two tables removing new duplicates [PROC SQL in SAS]

Question

similar questions have been asked on the forums but I seem to have a unique issue with mine. I'm not sure if this is because I don't have a unique ID or because my KEY is my actual data. I hope you guys can help.

I am trying to merge two tables (Old and New) that have identical column structures.

I want to retain all my values in the Old table and append ONLY new variables from New Table into a Combined Table. Any keys that exist in both tables should take on the value of the Old table.

OLD TABLE
Key | Points
AAA | 1
BBB | 2
CCC | 3

NEW TABLE
Key | Points
AAA | 2
BBB | 5
CCC | 8
DDD | 6

Combined TABLE
Key | Points
AAA | 1
BBB | 2
CCC | 3
DDD | 6

I feel like what I want to achieve is the venn diagram equivalent of this:

Venn diagram

... but for whatever reason I'm not getting the intended effect with this code:

CREATE TABLE Combined
SELECT * FROM Old as A
FULL OUTER JOIN New as B ON A.Key=B.Key
WHERE A.Key IS NULL OR B.Key IS NULL;

I was searching SO while figuring this out and couldn't seem to arrive at a solution that worked? — Wolfspirit, Dec 08 '15 at 06:01
`SELECT isnull(old.Key, new.Key) Key, isnull(old.Points, new.Points) Points FROM Old FULL OUTER JOIN New ON old.Key = new.Key` — Eric, Dec 08 '15 at 06:02
Looks like SAS has severe limitations - it doesn't recognise ISNULL as a function either :/ — Wolfspirit, Dec 08 '15 at 22:39

score 0 · Answer 1 · answered Dec 08 '15 at 06:04

0

This might help you.

SELECT B.[Key], MIN(CASE WHEN A.[Key] = B.[Key] THEN A.Points ELSE B.Points END) AS 'Points' FROM OldTable A CROSS APPLY NewTable B GROUP BY B.[Key]

answered Dec 08 '15 at 06:04

Tonton Sevilla

11
4

Its worth mentioning that I'm using the Proc SQL function in SAS and it doesn't recognise the Cross Apply function – Wolfspirit Dec 08 '15 at 06:15
To make sure that the points to be shown is the first points that was use for each key. Is that what you want to achieve? – Tonton Sevilla Dec 08 '15 at 06:18
What is the SQL Version you are using? – Tonton Sevilla Dec 08 '15 at 06:19
If CROSS APPLY is not working you can try this with same output SELECT B.[Key], MIN(CASE WHEN A.[Key] = B.[Key] THEN A.Points ELSE B.Points END) AS 'Points' FROM OldTable A, NewTable B GROUP BY B.[Key] – Tonton Sevilla Dec 08 '15 at 06:21
I'm testing your code now Tonton. I'm still wondering if your MIN() function would just pick up the smaller of two values from either key? The newer point allocated might not be larger. – Wolfspirit Dec 08 '15 at 22:50
Yep okay it took 50 mins to process and it picked the smaller value as opposed to the older value as I suspected :/ – Wolfspirit Dec 08 '15 at 23:32

score 0 · Answer 2 · answered Dec 09 '15 at 02:20

0

As long as there are no duplicate values for key in either table:

SELECT COALESCE(a.key,b.key) AS key, COALESCE(a.points,b.points) AS points
  FROM old a FULL OUTER JOIN new b ON a.key EQ b.key

Coalesce returns the first value if the first value is not missing, and returns the second value otherwise.

answered Dec 09 '15 at 02:20

Ludwig61

141
1
7

Unfortunately there are duplicate keys, of which we want to retain values from the older version. – Wolfspirit Dec 09 '15 at 03:14
Or did you mean duplicates within each individual table in itself? – Wolfspirit Dec 09 '15 at 03:15
I meant duplicates within a source table. As long as all key values in a given table are unique, and points are never null/missing, the above works. – Ludwig61 Dec 09 '15 at 13:47

score 0 · Answer 3 · answered Dec 09 '15 at 14:04

If you don't have duplicate keys within either table, then a simple update statement in a data step will do the job. You just need to make sure that NEW_TABLE is the first in the list, so the values in OLD_TABLE will replace those where the key matches. Any keys unique to one table will be output automatically.

Your data needs to be sorted by Key, as in your example.

data OLD_TABLE;
input Key $ Points;
datalines;
AAA 1
BBB 2
CCC 3
;
run;

data NEW_TABLE;
input Key $ Points;
datalines;
AAA 2
BBB 5
CCC 8
DDD 6
;
run;

data want;
update new_table old_table;
by key;
run;

RamB · Accepted Answer · 2015-12-09T20:11:06.050

0

Order the datasets

proc sort data=old; 
    by key; 
run;
proc sort data=new; 
    by key; 
run;

Combine them with a data set with by, output only the first key if there is a match

data combined;
set 
    old
    new;
by key;
if first.key then output;
run;

edited Dec 09 '15 at 20:11

answered Dec 09 '15 at 17:05

RamB

428
4
11

I ended up using just SAS instead of PROC SQL. Your implementation is a lot less convoluted than mine was! Thanks. – Wolfspirit Dec 10 '15 at 03:04
No problem :) Just be aware that this code will remove all duplicate key values even if they are only in the new table. The combined table will store only the first occurrence of said key. – RamB Dec 10 '15 at 12:46

Join / Merge two tables removing new duplicates [PROC SQL in SAS]

4 Answers4