0

I have a dataset where some SAS Datastep logic are needed to populate the columns that are missing, or to be derived from exiting columns.

The dataset looks more like the below:

  mpi  v1  v2  v3......v9  v10  v11.....v50
  001      a                    1.324
  002      c                    0.876
  003      f                    11.9
  004      r                    5.7
  005      b                    3.3
  .        .                     .
  .        .                     .   
  n        t                    0.4                   

I actually developed the program below:

 /*a*/
 IF v2 ('a') AND 0 <= v11 <= 2 THEN DO;
 v13 = 1;
 v14 =20;
 END;
 IF v2 IN ('a') AND 2 < v11 <= 3.1 THEN DO;
 v13 = 2;
 v14 =40;
 END;
 IF v2 IN ('a') AND 3.1 < v11<= 5.3 THEN DO;
 v13 = 3;
 v14 =60;     END;
 IF v2 IN ('a') AND 5.3 < v11 <= 11.5 THEN DO;
 v13 = 4;
 v14 =80;
 END;
 IF v2 IN ('a') AND v11 > 11.5 THEN DO;
 v13 = 5;
 v14 =100;
 END;

My request is that I need to write same program to populate v13 and v14 when v2 IN c, f, t, r, etc; but of different parameters for the bound in v11 (separate for c, e, g,...) while v13 and v14 remain the same for the categories.

I would like to use SAS macro to get this done to avoid repetition of program. Can you help out on this?

CharlesB
  • 86,532
  • 28
  • 194
  • 218
Dapsy
  • 1
  • 3

1 Answers1

0

The best way to do this is to create a dataset with the values of v2,v11,v13,v14, and merge it on or otherwise combine it with your dataset.

Doing that is a little more complicated when you have a range for a value, but by no means impossible.

Let's say you have a dataset, with v2, v11min, v11max, v13, and v14.

data mergeon;
input v2 $ v11min v11max v13 v14;
datalines;
a 0 2 1 20
a 2 3.1 2 40
a 3.1 5.3 3 60
a 5.3 11.5 4 80
a 11.5 9999 5 100
c 0 4 1 20
c 4 8.1 2 40
c 8.1 9.6 3 60
c 9.6 13.5 4 80
c 13.5 9999 5 100
;;;;
run;

data have;
input mpi v2 $ v11 v13 v14;
datalines;
1 a 2 0 0
2 a 4 0 0
3 c 1 0 0
4 c 7 0 0
5 c 9 0 0
6 a 22 0 0
7 a 10 0 0
;;;;
run;



proc sql;
create table want as
  select H.mpi, H.v2, H.v11, coalesce(M.v13,H.v13) as v13, coalesce(M.v14,H.v14) as v14
   from have H 
   left join mergeon M
   on H.v2=M.v2
   and M.v11min < H.v11 <= M.v11max
;
quit;

COALESCE chooses the first nonmissing value, meaning it will keep the H.v13 value only when M.v13 is missing (so, when the merge fails to find a record in the mergeon table).

If you aren't comfortable with SQL, you can also use a few other options; a hash table is probably the easiest, though you may also be able to use an update statement (not as familiar with those myself).

Joe
  • 62,789
  • 6
  • 49
  • 67
  • Thanks Joe for the insightful program, I want to believe that crafting a macro out for the program is impossible. – Dapsy Feb 19 '13 at 19:14