Remove records based on conditions in SAS

Question

I am completely new to SAS and I need to do a simple task. say I have a table like:

I need to remove all instances of sampCode for which at least one SampCode reflect the condition: 2*res > max. that is, not just removing the row, but the entire sampCode group based on that condition. thanks for helping!

And what is sampCode group? Your sampCode has a different value for every row. Show desired output. — Negdo, Dec 16 '22 at 12:51
you are right. I have replaced the table. Now, for the second row it results> ```2*res > max```, thus the output should not contain any of the `2014_AT14012534-001` entry. hope it is clearer now. thanks — efz, Dec 16 '22 at 13:37

Kermit · Accepted Answer · 2022-12-16T14:44:03.900

Try using a double DoW Loop.
It assumes your data set is sorted by sampCode.

data want;
    _keep=1;
    
    do _n_=1 by 1 until (last.sampCode);
        set have;
        by sampCode;
        if 2*res>max then _keep=0;
    end;
    
    do _n_=1 by 1 until (last.sampCode);
        set have;
        by sampCode;
        if _keep=1 then output;
    end;
    
    drop _keep;
run;

As a result, 2014_AT14012534-001 and 2014_AT14044069-001 are removed.

       sampCode        res   max
 2014_AT14036758-001  0.01  0.066
 2014_AT14051994-001  0.01  0.021
 2014_BE2549-14-0021  0.01   0.33
 2014_BE2549-14-0023  0.01   0.06
 2014_BE3013-14-0118  0.01  0.044
 2014_BE3259-14-0019  0.01    0.1
 2014_BE3259-14-0101  0.01  0.037
 2014_BE3320-14-0200  0.01  0.038
 2014_BE3365-14-0005  0.01  0.021
 2014_BE4040-14-0548 0.005   0.11
 2014_BE4685-14-0018  0.01  0.054
 2014_BE4804-14-0057  0.01   0.18
 2014_BE4824-14-0007  0.01   0.03

For an SQL approach, try

proc sql;
    create table want as
    select sampCode, res, max 
    from (select sampCode, res, max, 
            max(case when 2*res>max then 1 else 0 end) as _max 
            from have 
            group by sampCode 
            having _max=0);
quit;

If there is a sampCode that repeats (like 2014_AT14012534-001) you won't have all observations. You need double DoW loop. — Negdo, Dec 16 '22 at 14:27
For his given example it does not matter but in the general case indeed. Thank you ! — Kermit, Dec 16 '22 at 14:35
The first loop index `_n_` will have computed the group size at the end of the loop. The second loop can be a simple `_n_ = 1 to _n_;` and does not need a `BY` statement within. — Richard, Dec 16 '22 at 15:53

score 1 · Answer 2 · answered Dec 16 '22 at 15:16

To test if a boolean expression is ever true over a series of values just take the MAX(). SAS evaluates boolean expressions to 1 (true) or 0 (false). So the MAX() is true if any value is true. In your case you want to keep the observations where the condition is never met, so when the MAX() is false.

PROC SQL will happily remerge summary statistics onto all of the detailed observations.

select * 
  from have
  group by sampcode
  having not max(2*res > max)
;

score 0 · Answer 3 · answered Dec 16 '22 at 14:26

You have to use double DOW, otherwise you only get one observation per sampCode, even in cases where there are numerous observations with the same sampCode.

  data have;
        informat sampCode $40.;
        input sampCode res max;
        datalines;
    2014_AT14012534-001 0.01 0.034
    2014_AT14012534-001 2 0.144
    2014_AT14012534-001 0.01 0.015
    2014_AT14012534-001 0.01 0.075
    2014_AT14012534-002 0.01 0.034
    2014_AT14012534-002 0.01 0.314
    2014_AT14012534-003 0.01 0.034
    2014_AT14012534-003 0.02 0.934
    2014_AT14012534-003 0.01 0.034
    2014_AT14012534-004 3 0.001
    run;



data want;
    keep=1;
    
    do until (last.sampCode);
        set have;
        by sampCode;
        if 2*res>max then keep=0;
    end;
    do until (last.sampCode);
        set have;
        by sampCode;
        if keep=1 then output;
    end;
    
    drop keep;
run;

Many thanks. I-ll give it a try, but as my data have some 4M lines, a do approach seems too expensive though...I thought it could be done with some sql procedure — efz, Dec 16 '22 at 14:36
@efz In many cases, the DoW loop is more efficient than using the SQL Procedure. I edited my answer with an SQL approach anyway. — Kermit, Dec 16 '22 at 14:51
next time, should you be looking for an SQL answer, please mention it in the question and add the `proc-sql` tag — Kermit, Dec 16 '22 at 15:08

Remove records based on conditions in SAS

3 Answers3