3

I've been trying unsuccesfully for sometime now to collapse a data set using a PROC SQL with GROUPBY and was wondering if I could get some help. Here is an example of what I am trying to do. Suppose we have the following data:

id   year  parent_id  age
"01" 1990    "23"      17
"01" 1991    "23"      18
"01" 1992    "23"      19
"02" 1978    "18"      24
"02" 1979    "18"      25

that we wanted to collapse by id preserving the row with the min age across years to get the following dataset

id   year   parent_id   age
"01" 1990   "23"        17
"02" 1978   "18"        24

I tried something along the lines of

proc sql;
    CREATE TABLE output_tablename as
       SELECT DISTINCT id, year, parent_id, min(age) as age
       FROM input_tablename
       GROUPBY id;
quit;

to no avail.

student_t
  • 243
  • 2
  • 9
  • 2
    Using a `data` step here would be easy. For `proc sql`, you could join the `year` and `parent_id` after the `min(age)` calculation on `id` and `age`. something like - `proc sql; create table output_tablename as select a.*, b.year, b.parent_id from ( select id, min(age) as age from input_tablename group by id) a left join input_tablename b on a.id=b.id and a.age=b.age; quit;` – samkart Nov 16 '18 at 19:49
  • What does *to no avail* mean? What is the error or undesired result? Why not minimize all values except the grouping variable, *id*? – Parfait Nov 16 '18 at 20:46
  • Sorry for the confusion, I was getting duplicate results in my output dataset but I think this was actually a result of another mistake. However, the alternate methods suggested are very useful! – student_t Nov 16 '18 at 21:33

1 Answers1

3

You can use the HAVING clause to pick only records where age = min(age).

   proc sql;
   create table want as
   select * from have
   group by ID
   having age=min(age);
   quit;

PROC SORT option:

proc sort data=have; by id descending age;
run; 
proc sort data=have nodupkey out=want; 
by id;
run;
Reeza
  • 20,510
  • 4
  • 21
  • 38