Conditionally delete the most recently inserted observation in SAS

Question

I have two tables A and B that look like below.

Table A

rowno flag1 flag2 flag3 1 1 0 0 2 0 1 1 3 0 0 0 4 0 1 1 5 0 0 1 6 0 0 0 7 0 0 0 8 0 1 0 9 0 0 0 10 1 0 0

Table B

rowno flag1 flag2 flag3

Table A and B have the same column names but B is an empty table initially.

So what I want to accomplish is to insert the values from A to B row by row using macro, iteration by rowno. And each time I insert one row from A to B, I want to calculate the sum of each flag column.

If after insert each row, the sum(flag1) > 1 or sum(flag2) >1 or sum(flag3) >1, I need to delete that inserted row from table B. Then the iteration keeps running till the end of the observation in Table A. The final output in Table B is to have 5 observations from table A.

the code I have so far is below:

%macro iteration;

%do rowno=1 %to 10;

proc sql;

insert into table.B
select *
from table.A
where rowno = &rowno;

quit;

set table.B;

if
sum(flag1) > 1
or
sum(flag2) > 1
or
sum(flag3) > 1

then delete;

run;

%end;
%mend iteration;

%iteration

I received a lot of error messages.

Looking forward to your help and suggestions. Thanks.

The ideal output data would look like this

 rowno  flag1  flag2  flag3
    1    1      0      0
    2    0      1      1
    3    0      0      0
    6    0      0      0
    7    0      0      0

If you values are always 0/1 isn't that equivalent of checking if any of the variables about to be inserted are 1 and then not inserting? Your also not actually calculating a sum so that's why your code doesn't work. You need to embed your if sum(flag)>1 in a data step or proc sql logic that is correct. — Reeza, Sep 08 '15 at 15:54
not always. for example, if the first insertion is 0, then it will accept the 2nd one if it's 1, but not the 3rd one. Or if the first 30 observations are 0, then it can accept the 31st one if it's 1 but not the 32nd. Thank you for your input. I'll modify the logic part and put it into data step. — dozel, Sep 08 '15 at 16:00
You are not clear what you are summing over. If you are just talking about the single observation then there is no summing to be done. If you are talking about summing over multiple observations then which observations? — Tom, Sep 08 '15 at 16:03
What would you expect to be the end result of applying your algorithm to the sample data you provided? — Tom, Sep 08 '15 at 16:04
the sum is to sum the values in each flag column after each insertion. If the first insertion and it's flag value is 0, and the 2nd insertion with value 1, I'll keep the 2nd insertion because now the sum is (0 + 1) = 1. But if the 3rd insertion with flag value 1, the sum would be (0 + 1 + 1) > 1, so I'll have to delete the 3rd insertion. And so on until the end of table A. — dozel, Sep 08 '15 at 16:08
Post some sample input/output data. I don't think you need a macro but I'm too lazy to generate fake data — Reeza, Sep 08 '15 at 16:08
Re Tom: Table A has 30 observations. The end result is to have the best 5 observations from A and stored them into B. By "the best" I mean the observation meets the constrains. — dozel, Sep 08 '15 at 16:09
Sounds like you could just use an SQL max() function? ie. Return the highest number for each column? — Robert Penridge, Sep 08 '15 at 16:10
MAX() function? The highest number is 1 and lowest number is 0. And I'm looking for the highest number here. — dozel, Sep 08 '15 at 16:13

score 0 · Accepted Answer · answered Sep 08 '15 at 16:32

0

Instead of a macro, use a running sum to calculate the running sum of each row. If you need to delete a row remember to reverse the increment to the running sum. Based on your data, I think Row 9 should also be kept.

data TableA;
input rowno  flag1  flag2  flag3;
cards;
    1    1      0      0
    2    0      1      1
    3    0      0      0
    4    0      1      1
    5    0      0      1
    6    0      0      0
    7    0      0      0
    8    0      1      0
    9    0      0      0
   10    1      0      0
;
run;

data TableB;
set TableA;
retain sum_:;

*Increment running sum for flag;
sum_flag1+flag1;
sum_flag2+flag2;
sum_flag3+flag3;

*Check flag amounts;
if sum_flag1>1 or sum_flag2>1 or sum_flag3>1 then do;
    *if flag is tripped then delete increment to flag and remove record;
    sum_flag1 +-flag1;
    sum_flag2 +-flag2;
    sum_flag3 +-flag3;
    delete;
end;

run;

answered Sep 08 '15 at 16:32

Reeza

20,510
4
21
38

Wow. That's awesome. Thank you Reeza. – dozel Sep 08 '15 at 16:49
May I ask you why you used " +- " to delete increment and not just the minus sign " - "? I tried to use " - ", and got error message. – dozel Sep 08 '15 at 17:12
The minus sign only isn't valid SAS syntax - the `SUM` statement with implicit retain requires a +, the - converts that to a negative. – Reeza Sep 08 '15 at 17:14
oh I see. So you are adding a negative flag1, 2, 3 which is the same as minus. – dozel Sep 08 '15 at 17:28
Sorry, I have one last question. The retain sum_: ; statement. I understand that the colon : works as an operator modifier in this case. But I read the SAS document says that in order to pick up any string that starts with a certain pattern, we need to put the colon in front of the pattern text. Such as retain :sum_ But I don't know why it didn't work. Thank you! – dozel Sep 08 '15 at 17:37
Oh wait. it's called variable name wild card. It should be after the pattern text. Sorry for the bother! – dozel Sep 08 '15 at 17:39
The colon isn't an operator modifier in this case, there is no operator. Its a method of short hand notation see variable lists section in documentation. Also, the retain is probably not needed. – Reeza Sep 08 '15 at 17:40

Conditionally delete the most recently inserted observation in SAS

1 Answers1