Interpolation within Groups

Question

Goal

I want to interpolate within groups in a dataframe. This will give me an arbitrary number of intermediate points for each group within a dataframe.

Minimal working example

I have a dataframe like:

OldDataFrame <- data.frame(ID   = c(1,1,1,2,2,2),
                           time = c(1,2,3,1,2,3),
                           Var1 = c(-0.6 , 0.2, -0.8 , 1.6 , 0.3 , -0.8),
                           Var2 = c(0.5 , 0.7, 0.6 , -0.3 , 1.5 , 0.4) )

I want to get a function something like this:

TimeInterpolateByGroup <- function(DataFrame, 
                                   GroupingVariable, 
                                   TimeVariable,
                                   TimeInterval){
  #Something Here
}

It would be handy if I did not have to specify columns to do this on and it could operate automatically on every numeric column like numcolwise in plyr

So that I could apply it like this:

NewDataFrame = TimeInterpolateByGroup(DataFrame         = OldDataFrame,
                                      GroupingVariable  = "ID",
                                      TimeVariable      = "time",
                                      TimeInterval      = 0.25)

to get the NewDataFrame as:

NewDataFrame = data.frame(ID    =   c(  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  2   ),
                          time  =   c(  1,  1.25,   1.5,    1.75,   2,  2.25,   2.5,    2.75,   3,  1,  1.25,   1.5,    1.75,   2,  2.25,   2.5,    2.75,   3   ), 
                           Var1 =   c(  -0.6,   -0.4,   -0.2,   0,  0.2,    -0.05,  -0.3,   -0.55,  -0.8,   1.6,    1.275,  0.95,   0.625,  0.3,    0.025,  -0.25,  -0.525, -0.8    ), 
                           Var2 =   c(  0.5,    0.55,   0.6,    0.65,   0.7,    0.675,  0.65,   0.625,  0.6,    -0.3,   0.15,   0.6,    1.05,   1.5,    1.225,  0.95,   0.675,  0.4 ))

Or in picture form I want to:

enter image description here

Desired Application

I want to have an x-y plot of Var1 and Var2 with the points being each ID point at time = 1. Then I want to use the animate package to see the points move as time increases. To do this smoothly I need all of the coordinate sets for intermediate points in time.

The easiest way to do this would be to use time-series interpolation described here: http://stackoverflow.com/questions/16011790/add-missing-xts-zoo-data-with-linear-interpolation-in-r You can do that by group using either `dplyr` or `data.table`. You will have a better chance at getting a response if you show what you have tried first. — ilir, Apr 27 '14 at 20:06
Thanks Ilir. I wanted to avoid time series interpolation because I thought there should be an easier solution. Also I think there could be more general solutions - here it is time but on other occasions you may want to interpolate for distance from a location or something so making a variable into a time series would be confusing. I think James' solution below is better. — Stuart, Apr 27 '14 at 21:08

score 3 · Accepted Answer · edited May 10 '14 at 14:54

I'm fairly sure that the code below gives the correct answer, except for a tiny level of numerical imprecision due to the use of the approx() function. The basic idea is to use ddply to split and combine data frames, and approx to do the interpolation.

library(plyr)

# time_interpolate is a helper function for TimeInterpolateByGroup
# that operates on each of the groups. In the input to this function,
# the GroupingVariable column of the data frame should be single-valued.
# The function returns a (probably longer) data frame, with estimated
# values for the times specified in the output_times array.
time_interpolate <- function(data_frame,
                             GroupingVariable,
                             time_var,
                             output_times) {
  input_times <- data_frame[, time_var]
  exclude_vars <- c(time_var, GroupingVariable)
  value_vars <- setdiff(colnames(data_frame), exclude_vars)
  output_df <- data.frame(rep(data_frame[1,GroupingVariable], length(output_times)), output_times)
  colnames(output_df) <- c(GroupingVariable, time_var)
  for (value_var in value_vars) {
    output_df[,value_var] <- approx(input_times, data_frame[, value_var], output_times)$y
  }
  return(output_df)
}

# A test for time_interpolate
time_interpolate(OldDataFrame[1:3,], "ID" , "time", seq(from=1, to=3, by=0.25))

TimeInterpolateByGroup <- function(DataFrame, 
                                   GroupingVariable, 
                                   TimeVariable,
                                   TimeInterval){
  min_time <- min(DataFrame[, TimeVariable])
  max_time <- max(DataFrame[, TimeVariable])
  output_times <- seq(from=min_time, to=max_time, by=TimeInterval)
  ddply(DataFrame,
        GroupingVariable,
        time_interpolate,
        GroupingVariable=GroupingVariable,
        time_var=TimeVariable,
        output_times=output_times)
}

Thanks James. That works out really well. It is nicely self contained too which is handy. I couldn't figure this out for a while so thank you very much for your help — Stuart, Apr 27 '14 at 21:03

score 3 · Answer 2 · answered Apr 27 '14 at 21:22

You may also use na.approx from zoo package.

library(zoo)
my_fun <- function(DataFrame, GroupingVariable, TimeVariable, TimeInterval){
  do.call(rbind, by(DataFrame, DataFrame[ , GroupingVariable], function(dat){
    tt <- data.frame(time = seq(from = min(dat[ , TimeVariable]),
                                to = max(dat[ , TimeVariable]),
                                by = TimeInterval))
    dat2 <- merge(tt, dat, all.x = TRUE)
    na.approx(dat2)
  }))
}

my_fun(df, "ID", "time", 0.25)

Interpolation within Groups

Goal

Minimal working example

Or in picture form I want to:

Related Question that didn't quite work

Desired Application

2 Answers2