I have panel data and numerous variables are missing observations before certain years. The years vary across variables. What is an efficient way to extrapolate for missing data points across multiple columns? I'm thinking of something as simple as extrapolation from a linear trend, but I'm hoping to find an efficient way to apply the prediction to multiple columns. Below is a sample data set with missingness similar to what I'm dealing with. In this example, I'm hoping to fill in the NA values in the "National GDP" and "National Life Expectancy" variables using a linear trend calculated with the observed data points in each column.
###Simulate National GDP values
set.seed(42)
nat_gdp <- c(replicate(20L, {
foo <- rnorm(3, mean = 2000, sd = 300) + c(0,1000,2000)
c(NA,NA,foo)}))
###Simulate national life expectancy values
nat_life <- c(replicate(20L, {
foo <- rnorm(2, mean = 55, sd = 7.8) + c(0,1.5)
c(NA,NA,NA,foo)}))
###Construct the data.table
data.sim <- data.table( GovernorateID = c(rep(seq.int(11L,15L,by=1L), each = 20)),
DistrictID =rep(seq.int(1100,1500,by=100),each=20 ) + rep(seq_len(4), each = 5),
Year = seq.int(1990,1994,by=1L),
National_gdp = nat_gdp ,
National_life_exp = nat_life )