Beginner r user here. I have a dataset of yearly employment numbers for different industry classifications and different subregions. For some observations, the number of employees is null. I would like to fill these values through linear interpolation (using na.approx or some other method). However, I only want to interpolate within the same industry classification and subregion.
For example, I have this:
subregion <- c("East Bay", "East Bay", "East Bay", "East Bay", "East Bay", "South Bay")
industry <-c("A","A","A","A","A","B" )
year <- c(2013, 2014, 2015, 2016, 2017, 2002)
emp <- c(50, NA, NA, 80,NA, 300)
data <- data.frame(cbind(subregion,industry,year, emp))
subregion industry year emp
1 East Bay A 2013 50
2 East Bay A 2014 <NA>
3 East Bay A 2015 <NA>
4 East Bay A 2016 80
5 East Bay A 2017 <NA>
6 South Bay B 2002 300
I need to generate this table, skipping interpolating the fifth observation because subregion and industry do not match the previous observation.
subregion industry year emp
1 East Bay A 2013 50
2 East Bay A 2014 60
3 East Bay A 2015 70
4 East Bay A 2016 80
5 East Bay A 2017 <NA>
6 South Bay B 2002 300
Articles like this have been helpful, but I cannot figure out how to adapt the solution to match the requirement that two columns be the same for interpolation to occur, instead of one. Any help would be appreciated.