I have two data sets :
- "mothers" - 5,512 observations where the variable "MOTHER" = 1
- "all_women" - 2,336,750 observations where the variable "MOTHER" = 0
I combine the two as follows:
data combined;
set mothers all_women;
Now as the mothers are already in the datset all_women, I want to delete the repeated entries - with the condition that I keep the observations where "MOTHER"=1.
I tried the following:
proc sort data=combined; by ID DESCENDING MOTHER; run;
proc sort data=combined nodupkeys; by ID; run;
yet I lose some of the mothers because I am left with only 5458 observations where "MOTHER"=1. What have I done to introduce this error?