0

enter image description here

data have;
    length col1-col10 $2.;
    input col1-col10;
    datalines;
PM MM JM MM PM MM MM PB . .
MM JM PM MB . . . . . .
MM MM PM JM PB MM . . . .
PM PM PM MM MB MM JM . . .
;
run;

Goal: Hi all,

My goal here is that I want to flag all records which have values other than 'PB' or 'MB' after them in sequence across the columns.

Any help is appreciated. Thanks!

Expected output:

                           Flag
PM MM JM MM PM MM MM PB . .  0
MM JM PM MB . . . . . .      0
MM MM PM JM PB MM . . . .    1
PM PM PM MM MB MM JM . . .   1
Richard
  • 25,390
  • 3
  • 25
  • 38

2 Answers2

1

After scratching the memory banks I remembered that naming syntax can list variables in reverse order!

I read the manuals way...

data work.want;
  set work.have;
  flag = cats(of col10-col1) not in: ('PB', 'MB');
run;

Seems like an obvious way...

If all rows have at least one PB or MB you can loop from end of array to 1 and flag if first value is not PB or MB.

If a row can be without either PB or MB you would have to check for that as well.

Example:

Leave checking loop as soon as first value encountered is evaluated for flag condition.

data want;
  set have;
  array col col1-col10;
  do i = dim(col) to 1 by -1;
    if not missing(col(i)) then do;
      flag = col(i) not in ('PB', 'MB');
      leave;
    end;
  end;
Richard
  • 25,390
  • 3
  • 25
  • 38
1

You can use one array and whichc to find the index of where MB or PB is in the columns and then you can find the first non missing column index and then make a flag off these two variables. This will work if all the missing is at the end (as coded in your example).

data want1;
set have;
array s(*)  col1-col10 ;
max=max(whichc('PB', of s[*]),whichc('MB', of s[*])) ;
max2=whichc(' ', of s[*])-1 ;
if max2 > max then fl = 1;
else fl =0;
run;
Mike
  • 3,797
  • 1
  • 11
  • 30