1

Let's say I have data that look like this:

DATA temp;
  INPUT id a1 b2 d1 f8;
DATALINES;
1 2.3 2.1 4.2 1.2 
2 5.3 2.3 1.5 3.2
3 1.2 5.4 6.6 6.6
; 
run;

What I want to do is use the data and set statements to say that if the values in a1 and f8 are less than the means of a1 and f8 (respectively), then those values are missing. So the resulting dataset would look like:

 id   a1   b2   d1   f8
  1    .  2.1  4.2    .
  2  5.3  2.3  1.5    .
  3    .  5.4  6.6  6.6

Any tips for how I would start on this? I'm new to SAS and the examples in the manuals have not been very helpful. I had been thinking of something like this (but it doesn't work):

DATA temp2;
    SET temp;
        IF a1 < mean(a1) THEN a1=.;
        IF f8 < mean(f8) THEN f8=.;
 RUN;

2 Answers2

1

The SAS implementation of SQL can do automatic application of group or data wise aggregates against a result set.

Proc SQL;
  create table want as
  select
    case when (a1 < mean(a1)) then . else a1 as a1,
    b2,
    d1,
    case when (f8 < mean(f8)) then . else f8 as f8
  from have;

A solution that uses DATA step will need to precompute data set statistics, commonly with a procedure such as MEANS, SUMMARY or UNIVARIATE.

proc means noprint data=have;
  output out=have_means mean(a1 f8)= / autoname;
run;

data want;
  if _n_ = 1 then do;
    set have_means(keep=a1_mean f8_mean);
  end;

  set have;
  if a1 < a1_mean then a1 = .;
  if f8 < f8_mean then f8 = .;

  drop a1_mean f8_mean;
run;

Other techniques can update a data set in place and would use SQL UPDATE or DATA step MODIFY

Richard
  • 25,390
  • 3
  • 25
  • 38
0

mean function is applied across the row and not within the column in datastep, that is why you are not getting the results. @Richard answer is perfect. to do in datastep to get mean, you need to use DOW loop and then append with main dataset. It is much easier to use proc summary as @Richard explains.

    data temp2_intial(keep= mean_a1 mean_f8);
   do until(eof);
   set temp end =eof;
   tot_a1 = sum(tot_a1, a1);
   cnt_a1=sum(cnt_a1,1);
   mean_a1 = tot_a1/cnt_a1;

  tot_f8 = sum(tot_f8, f8);
  cnt_f8=sum(cnt_f8,1);
  mean_f8 = tot_f8/cnt_f8;
    end;
  run;

  data temp2(drop= mean_a1 mean_f8);
  set temp ;
  if _n_ =1 then set temp2_intial;
  IF a1 < mean_a1 THEN a1=. ;
 IF f8 < mean_f8 THEN f8=.;
  run;
Kiran
  • 3,255
  • 3
  • 14
  • 21