I am trying to expand yearly values in my panel data to year-quarter values. That is repeat the yearly values to every quarter. For e.g., I am looking to get the repeated values of income for year-quarter 2000Q1, 2000Q2, 2000Q3, 2000Q4, 2001Q1, ... , 2001Q4. So the data frame would be id,year-quarter, income.
I use a two step approach but have some issues to handle. If the quarterly starting value is missing, then I would then need to the quarterly to be missing (NA) too.
Case 1:
annual_data <- data.frame(
person=c(1, 1, 1, 2, 2,2),
year=c(2010, 2011, 2012, 2010, 2011, 2012),
income=c(4, 10, 13, 1, NA, 30)
)
Case 2:
annual_data <- data.frame(
person=c(1, 1, 1, 2, 2,2),
year=c(2010, 2011, 2012, 2010, 2011, 2012),
income=c(4, 10, 13, NA, NA, 30)
)
In the first step, I expand the data to quarterly as was mentioned: interpolating in R yearly time series data with quarterly values
So use a function such as:
expand <- function(x) {
years <- min(x$year):max(x$year)
quarters <- 1:4
grid <- expand.grid(quarter=quarters, year=years)
x$quarter <- 1
merged <- grid %>% left_join(x, by=c('year', 'quarter'))
merged$person <- x$person[1]
return(merged)
}
Then I used in
zoo::na.locf
dplyr::mutate.
quarterlydata <- annual_data %>% group_by(person) %>% do(expand(.))
testdata <- quarterlydata %>% group_by(person) %>% mutate(ynew=zoo::na.locf(y))
but havent had much luck as it copies forward to all missing values from the previous non-missing values. That is,
Case 1: it copies all values, So income of 1 for person 2 gets copied over to 2010 and 2011. When it must be copied over to just 2010, and 2011 should be NAs.
For case 2: I get
Error: incompatible size (%d), expecting %d (the group size) or 1.
Any thoughts on where I am missing?