1

I have three columns indicating the name of a center, the center's state, and whether or not the state associated with the center. I am trying to see if a center is part of a chain that operates only in intervention states, never in intervention states, or in intervention and non-intervention states.

data df;
     input $ center state   intervention   chain;
             Center1  CA      1            chain1
             Center2  AZ      0            chain1
             Center3  PA      0            chain2
             Center4  HI      0            chain2
             Center5  CA      1            chain3
             Center6  CA      1            chain3;
run;

What is the best way to do this? I have tried creating three separate tables for intervention, nonintervention, and both that list the chains by state of operation. I then created dummy variables in each that indicate whether they operate in intervention, nonintervention, or both states. I then merged them back into the original table but there was overlap between the intervention/both tables, as well as overlap between the nonintervention/both tables.

Joe
  • 62,789
  • 6
  • 49
  • 67

1 Answers1

1

I like a non-sql solution for this; it is doable in SQL, but it's so easy outside of it, comparatively.

data df;
     input center $ state   $ intervention   chain $;
     datalines;
             Center1  CA      1            chain1
             Center2  AZ      0            chain1
             Center3  PA      0            chain2
             Center4  HI      0            chain2
             Center5  CA      1            chain3
             Center6  CA      1            chain3
             ;
run;
proc sort data=df;
  by chain intervention;
run;

data want;
  do _n_ = 1 by 1 until (last.chain);     *first, check to see if it is mixed or int or nonint;
    set df;
    by chain intervention;                *by intervention lets us easily see mixed;
    length int_type $16;
    if last.intervention and not last.chain then mixed=1;   *if we see a row that is the last row for a by-group of intervention, but not chain, then we know this chain is mixed;
    if last.chain and mixed=1 then int_type='Mixed';
    else if intervention then int_type='Intervention';
    else int_type='Non-Intervention';
  end;
  do _n_ = 1 by 1 until (last.chain);   *now run through the data set a second time to put the int_type variable on the dataset;
    set df;
    by chain;
    output;
  end;
run;
    

This is the DoW loop, and iterates twice through the dataset, first checking if the conditions for mixed are met, if so then it marks it mixed, if not it marks it based on the int_type of the last record.

If your dataset is so large that a sort is not desirable, this can be done in other ways without sorting, but this should be relatively fast even with the sort. If it's already sorted by CHAIN but not by CHAIN INTERVENTION, it could be done solely sorted by CHAIN, just with a bit more logic.

Joe
  • 62,789
  • 6
  • 49
  • 67