2

I am trying to expand yearly values in my panel data to year-quarter values. That is repeat the yearly values to every quarter. For e.g., I am looking to get the repeated values of income for year-quarter 2000Q1, 2000Q2, 2000Q3, 2000Q4, 2001Q1, ... , 2001Q4. So the data frame would be id,year-quarter, income.

I use a two step approach but have some issues to handle. If the quarterly starting value is missing, then I would then need to the quarterly to be missing (NA) too.

Case 1:

annual_data <- data.frame(
person=c(1, 1, 1, 2, 2,2),
year=c(2010, 2011, 2012, 2010, 2011, 2012),
income=c(4, 10, 13, 1, NA, 30)
)

Case 2:

annual_data <- data.frame(
person=c(1, 1, 1, 2, 2,2),
year=c(2010, 2011, 2012, 2010, 2011, 2012),
income=c(4, 10, 13, NA, NA, 30)
)

In the first step, I expand the data to quarterly as was mentioned: interpolating in R yearly time series data with quarterly values

So use a function such as:

expand <- function(x) {
    years <- min(x$year):max(x$year)
    quarters <- 1:4
    grid <- expand.grid(quarter=quarters, year=years)
    x$quarter <- 1
    merged <- grid %>% left_join(x, by=c('year', 'quarter'))
    merged$person <- x$person[1]
    return(merged)
   }

Then I used in

    zoo::na.locf 

    dplyr::mutate.

quarterlydata <- annual_data %>% group_by(person) %>% do(expand(.))
testdata <- quarterlydata %>% group_by(person) %>% mutate(ynew=zoo::na.locf(y))

but havent had much luck as it copies forward to all missing values from the previous non-missing values. That is,

Case 1: it copies all values, So income of 1 for person 2 gets copied over to 2010 and 2011. When it must be copied over to just 2010, and 2011 should be NAs.

For case 2: I get

 Error: incompatible size (%d), expecting %d (the group size) or 1.

Any thoughts on where I am missing?

Community
  • 1
  • 1
rajvijay
  • 1,641
  • 4
  • 23
  • 28

1 Answers1

3

For case 1 you are missing the year in your group_by. Since using the code that you have, the groupings for na.locf thinks that year is part of the grouping which na.locf must run over.

testdata <- quarterlydata %>% 
  group_by(person, year) %>% 
  mutate(ynew=zoo::na.locf(income, na.rm=FALSE))

With the output:

> tail(testdata, 13)
Source: local data frame [13 x 5]
Groups: person, year

   quarter year person income ynew
1        4 2012      1     NA   13
2        1 2010      2      1    1
3        2 2010      2     NA    1
4        3 2010      2     NA    1
5        4 2010      2     NA    1
6        1 2011      2     NA   NA
7        2 2011      2     NA   NA
8        3 2011      2     NA   NA
9        4 2011      2     NA   NA
10       1 2012      2     30   30
11       2 2012      2     NA   30
12       3 2012      2     NA   30
13       4 2012      2     NA   30

For case 2, as you might already infer from the code above, you must have na.rm set to FALSE otherwise the vector will drop off all NA which it could not extrapolate.

So using exactly the same code for case 2 we will have the output:

> tail(testdata, 13)
Source: local data frame [13 x 5]
Groups: person, year

   quarter year person income ynew
1        4 2012      1     NA   13
2        1 2010      2     NA   NA
3        2 2010      2     NA   NA
4        3 2010      2     NA   NA
5        4 2010      2     NA   NA
6        1 2011      2     NA   NA
7        2 2011      2     NA   NA
8        3 2011      2     NA   NA
9        4 2011      2     NA   NA
10       1 2012      2     30   30
11       2 2012      2     NA   30
12       3 2012      2     NA   30
13       4 2012      2     NA   30
chappers
  • 2,415
  • 14
  • 16