Filtering higher values using proc sql

Question

I would need to filter a dataset based on the higher value of var1 per each group.

I have this dataset:

Var1  t  avg
AA   1  0.02
AA   0  0.21
BB   1  0.05
BB   0  0.20
CC   1  0.10
CC   0  0.14

Built as follows

Proc sql;
    Select 
    Var1,
    t,
    Avg(var2) as avg
    From my_data
    Group by 1,2
    Order by 1,2; quit;

My expected output would be

    AA   0  0.21
    BB   0  0.20
    CC   0  0.14

Could you please give me some tips on how to do it? I think I should do something to rank them first, then select the observations having rank value = 1 (if in descending order)

I think you mean the higher value of the average . . . and the example is a little misleading because all the time values are `0`. — Gordon Linoff, Sep 01 '20 at 14:42

score 0 · Answer 1 · answered Sep 01 '20 at 14:44

0

This is rather tricky in proc sql. I think this will do what you want:

proc sql;
    select Var1, t, avg(avg_var2) as average
    from my_data d
    group by 1
    having avg(avg_var2) = (select max(average)
                            from (select avg(avg_var2) as average
                                  from my_data d2
                                  where d2.var2 = d.var
                                  group by var1, t
                                 )
                           );

answered Sep 01 '20 at 14:44

Gordon Linoff

1,242,037
58
646
786

Thanks Gordon. I have got however no values. Did you get the same results? – Sep 01 '20 at 14:59
@Math . . . This might be better done with a hybrid solution or saving the aggregated data to a dataset and then using that to get the max. – Gordon Linoff Sep 01 '20 at 15:25

score 0 · Accepted Answer · answered Sep 01 '20 at 16:00

From your input, add a next step using one of the approaches below.

You can use SQL with the HAVING clause to get the max.

proc sql;
create table want as
select var1, t, avg
from have 
group by var1
having avg=max(avg);
quit;

Or, assuming no duplicates are possible in the average value you can easily use a PROC SORT.

proc sort data=have;
by var1 descending avg;
run;

proc sort data=have out=want_highest nodupkey;
by var1;
run;

score 0 · Answer 3 · answered Sep 01 '20 at 23:29

You can compute the averages for each id,t group in a sub-query and from that select highest average over id group.

Example:

data raw;
  call streaminit(123);
  do id = 1 to 100;
    do t = 0 to rand('integer',1,2);
      do rep = 1 to rand('integer', 20);
        y = round(rand('uniform'), 0.01);
        output;
      end;
    end;
  end;
run;

proc sql;
  create table want as
  select id, t, avg
  from (
    select id, t, mean(y) as avg format=5.2
    from raw
    group by id, t
  ) as averages
  group by id
  having avg = max(avg)
  ;

Filtering higher values using proc sql

3 Answers3