SAS - need to use .first .last obs but how to specify with multiple key/compound key?

Question

I'm trying to use .first and .last obs to get rid of BOTH duplicates in pairs of duplicates (by writing out dupes and uniques to separate tables). The issue is that my key is made up of several variables - a household id, product name, and date variable (actually day and month of a date field). How do I properly specify this using SAS? If it is easier using PROC SQL, that's fine too.

You should post what you've tried so far. Also, look into PROC SORT and the DUPOUT, NODUPKEY options. — Reeza, Sep 14 '15 at 14:46

score 2 · Answer 1 · answered Sep 14 '15 at 13:18

2

When using BY processing to access duplicates you really only need to look at the FIRST. and LAST. variables for the last variable in the BY list.

data nodups dups ;
  set have ;
  by household product date;
  if (first.date and last.date) then output nodups;
  else output dups;
run;

This will output the unique records to one dataset and those with duplicates to another.

answered Sep 14 '15 at 13:18

Tom

47,574
2
16
29

Thanks - I'd never seen an actual example so I assumed I had to somehow include all keys as a specification on the first. last. - I'll try it! – user3791254 Sep 14 '15 at 15:07

score 1 · Answer 2 · answered Sep 14 '15 at 15:14

In 9.4 you have a new option: NOUNIQUEKEY.

proc sort data=have nouniquekey uniqueout=ds_uniques out=ds_dups;
  by your_key_variables_here;
run;

It takes any observation with a unique (exactly 1) count and writes it to the UNIQUEOUT= dataset, and then anything with 2+ (so, a duplicate) is written to the normal OUT= dataset (both copies of the duplicate, not just one like NODUPKEY).

score -1 · Answer 3 · answered Sep 14 '15 at 13:30

-1

SQL for unique observations would be -

proc sql;
create table want as
select distinct household,product,date
from have
;
quit;

Then transversely, you could do a count query to ID your duplicates...But Data step may be better since you can output your dup results in the same step.

answered Sep 14 '15 at 13:30

SMW

470
1
4
19

Thanks for the SQL option – user3791254 Sep 14 '15 at 15:07
This wouldn't do what he's asking. This is identical (more or less) to `PROC SORT NODUPKEY`, which is a distinct result from what is requested. – Joe Sep 14 '15 at 15:14
Correct (as in yes, this is like the nodupkey option). I must have misinterpreted the question, but I also didn't realize you could do what you suggested (nouniquekey option)...seems useful! – SMW Sep 14 '15 at 15:17

SAS - need to use .first .last obs but how to specify with multiple key/compound key?

3 Answers3