SAS: Proc sort nodupkeys error

Question

I have two data sets :

"mothers" - 5,512 observations where the variable "MOTHER" = 1
"all_women" - 2,336,750 observations where the variable "MOTHER" = 0

I combine the two as follows:

data combined;
set mothers all_women;

Now as the mothers are already in the datset all_women, I want to delete the repeated entries - with the condition that I keep the observations where "MOTHER"=1.

I tried the following:

proc sort data=combined; by ID DESCENDING MOTHER; run;

proc sort data=combined nodupkeys; by ID; run;

yet I lose some of the mothers because I am left with only 5458 observations where "MOTHER"=1. What have I done to introduce this error?

It seems weird to SET the two data sets together and expect the same number of records as the first data set, I would have expected a merge instead of SET in your combined data set. — Reeza, Oct 16 '14 at 14:03

score 2 · Accepted Answer · answered Oct 16 '14 at 15:15

Instead of using NODUPKEY, use FIRST./LAST. processing.

proc sort data=combined; 
  by ID DESCENDING MOTHER; 
run;

data want;
  set combined;
  by ID descending mother;
  if not (first.ID) and (mother=0) then delete;
run;

That would keep any ID that had mother=0 only (keep 1 record per), and would keep all mother=1 rows.

score 1 · Answer 2 · answered Oct 16 '14 at 13:23

1

Have you checked whether there were any duplicate IDs in the mothers dataset? The second proc sort would have eliminated those rows.

You can check like so:

proc sort data = mothers nodupkey out = mothers_dedup dupout = mothers_dups;
  by ID;
run;

If mothers_dups contains more than 0 observations, this might account for the problem.

answered Oct 16 '14 at 13:23

user667489

9,501
2
24
35

Just checked - the source of the problem is that there are duplicate IDs in the mothers dataset. However I need to keep these duplicates as they correspond to multiple births. Is there a way to only eliminate the duplicates where "MOTHER" = 0 ? – user2568648 Oct 16 '14 at 13:34

SAS: Proc sort nodupkeys error

2 Answers2