4

I have a data.frame that contains quarterly observations. I now want to interpolate monthly values (preferred cubic, linear is fine). The intermediate goal should be to create a data.frame with DATE as the index and missing values for all the monthly observations.

Googling showed that I should create an empty data.frame for the whole time range and then merge it - but what ever I tried so far gave me errors. Here's my procedure; but since I'm a newb to r, I'm open to any suggestions for changes.

> str(ger)
'data.frame':   93 obs. of  2 variables:
 $ DATE : Date, format: "1991-01-01" "1991-04-01" "1991-07-01" "1991-10-01" ...
 $ VALUE: num  470780 468834 466332 472949 480359 ...
> head(ger)
        DATE    VALUE
1 1991-01-01 470780.3
2 1991-04-01 468834.0
3 1991-07-01 466331.6
4 1991-10-01 472949.0
5 1992-01-01 480359.2
6 1992-04-01 476744.5

emptyIndex <- seq(ger[1, 'DATE'], tail(ger[, 'DATE'], 1), by='1 month')
gerMonthly <- data.frame(DATE = emptyIndex, VALUE = NA)
merge(ger, gerMonthly, by='DATE', all.y = T)

This is the closest I got, but it gives me an undesired column format - there surely is a cleaner way to get what I want? Finally, given the format, what would be the cleanest way to get the interpolated time series?

          DATE  VALUE.x VALUE.y
1   1991-01-01 470780.3      NA
2   1991-02-01       NA      NA
3   1991-03-01       NA      NA
4   1991-04-01 468834.0      NA
5   1991-05-01       NA      NA
6   1991-06-01       NA      NA
FooBar
  • 15,724
  • 19
  • 82
  • 171

2 Answers2

6

I'm not quite clear on your comment about the undesired column format but if you're trying to get the interpolated values using a cubic interpolation, you might consider something like the code below

ger <- data.frame(DATE= as.Date(c("1991-01-01", "1991-04-01", "1991-07-01", "1991-10-01", "1992-01-01" )),
              +                   VALUE= c(470780, 468834, 466332, 472949, 480359))
DateSeq <- seq(ger$DATE[1],tail(ger$DATE,1),by="1 month")
gerMonthly <- data.frame(DATE=DateSeq, Interp.Value=spline(ger, method="natural", xout=DateSeq)$y)
merge(ger, gerMonthly, by='DATE', all.y = T)

The DATE column needs to be in Date format so the interpolation can work with numeric values. I've usually used "natural" cubic splines but other options are available. This format shows both the input values and the results so that you can check that the interpolation looks reasonable but you can use gerMonthly if you just want the interpolated results.

WaltS
  • 5,410
  • 2
  • 18
  • 24
1

Reading your code has really helped me. To obtain the interpolated values, I did this:

library(tseries)
library(zoo)

# the last line of your code, named for convenience.

merged_data < - merge(ger, gerMonthly, by='DATE', all.y = T)

# declare your desired variable as a time series, 

monthly_data <- ts(merged_data$VALUE.x, start = c(1991, 1), end = c(1998,10), frequency = 12)

# interpolate: na.approx linearly interpolates NA values in a time series, na.spline will use cubic spline interpolation.

na.approx(monthly_data) # or:
na.spline(monthly_data)
Martin Gal
  • 16,640
  • 5
  • 21
  • 39