2

I'm trying to use .first and .last obs to get rid of BOTH duplicates in pairs of duplicates (by writing out dupes and uniques to separate tables). The issue is that my key is made up of several variables - a household id, product name, and date variable (actually day and month of a date field). How do I properly specify this using SAS? If it is easier using PROC SQL, that's fine too.

Joe
  • 62,789
  • 6
  • 49
  • 67
user3791254
  • 45
  • 1
  • 1
  • 5
  • You should post what you've tried so far. Also, look into PROC SORT and the DUPOUT, NODUPKEY options. – Reeza Sep 14 '15 at 14:46

3 Answers3

2

When using BY processing to access duplicates you really only need to look at the FIRST. and LAST. variables for the last variable in the BY list.

data nodups dups ;
  set have ;
  by household product date;
  if (first.date and last.date) then output nodups;
  else output dups;
run;

This will output the unique records to one dataset and those with duplicates to another.

Tom
  • 47,574
  • 2
  • 16
  • 29
  • Thanks - I'd never seen an actual example so I assumed I had to somehow include all keys as a specification on the first. last. - I'll try it! – user3791254 Sep 14 '15 at 15:07
1

In 9.4 you have a new option: NOUNIQUEKEY.

proc sort data=have nouniquekey uniqueout=ds_uniques out=ds_dups;
  by your_key_variables_here;
run;

It takes any observation with a unique (exactly 1) count and writes it to the UNIQUEOUT= dataset, and then anything with 2+ (so, a duplicate) is written to the normal OUT= dataset (both copies of the duplicate, not just one like NODUPKEY).

Joe
  • 62,789
  • 6
  • 49
  • 67
-1

SQL for unique observations would be -

proc sql;
create table want as
select distinct household,product,date
from have
;
quit;

Then transversely, you could do a count query to ID your duplicates...But Data step may be better since you can output your dup results in the same step.

SMW
  • 470
  • 1
  • 4
  • 19
  • Thanks for the SQL option – user3791254 Sep 14 '15 at 15:07
  • This wouldn't do what he's asking. This is identical (more or less) to `PROC SORT NODUPKEY`, which is a distinct result from what is requested. – Joe Sep 14 '15 at 15:14
  • Correct (as in yes, this is like the nodupkey option). I must have misinterpreted the question, but I also didn't realize you could do what you suggested (nouniquekey option)...seems useful! – SMW Sep 14 '15 at 15:17