0

Say we are confined to using SAS and have a panel/longitudinal dataset. We have indicators for cohort and time, as well as some measured variable y.

data in;
input cohort time y;
datalines;
1   1  100 
1   2  101  
1   3  102 
1   4  103
1   5  104
1   6  105 
2   2   .  
2   3   .  
2   4   .  
2   5   .
2   6   .
3   3   .
3   4   .
3   5   .
3   6   .
4   4  108
4   5  110
4   6  112
run;

Note that units of cohort and time are the same so that if the dataset goes out to time unit 6, each successive panel unit will be one period shorter than the one before it in time.

We have a gap of two panel units between actual data. The goal is to linearly interpolate the two missing panel units (values for cohort 2 and 3) from the two that "sandwich" them. For cohort 2 at time 5 the interpolated value should be 0.67*104 + 0.33*110, while for cohort 3 at time 5 it would be 0.33*104 + 0.67*110. Basically you just weight 2/3 for the closer panel unit with actuals, and 1/3 for the further panel unit. You'll of course have missing values, but for this toy example that's not a problem.

I'm imagining the solution involves lagging and using the first. operator and loops but my SAS is so poor I hesitate to provide even my broken code example.

JPErwin
  • 223
  • 1
  • 10

1 Answers1

0

I've got a solution, it is however tortured. There must be a better way to do it, this takes one line in Stata.

First we use proc SQL to make a table of the two populated panel units, the "bread of the sandwich"

proc sql; 
create table haveY as
select time, cohort, y
from startingData 
where y is not missing
order by time, cohort;

quit;

Next we loop over the rows of this reduced dataset to produce interpolated values, I don't completely follow the operations here, I modified a related example I found.

data wantY;
set haveY(rename=(y=thisY cohort=thisCohort));
by time;

retain lastCohort lastY;
lastcohort = lag(thisCohort);
lastY = lag(thisY);
if not first.time then do;
  do cohort = lastCohort +1 to thisCohort-1;
    y = ((thisCohort-cohort)*lastY + (cohort-lastCohort)*thisY)/(thisCohort-lastCohort);
    output;
    end;
end;
cohort=thisCohort;
y=thisY;
drop this: last:;
run;

proc sort data=work.wantY;
by cohort time;
run;

This does produce what is needed, it can be joined using proc sql into the starting table: startingData. Not a completely satisfying solution due to the verbosity but it does work.

JPErwin
  • 223
  • 1
  • 10