0

I'm working with a nested list (ground sensors) of lists (measurement depths) of lists (data frames for each year from 2014-2018) and I want to perform linear interpolation for each of those data frames. Here is an overview of the dataset, just so you can get an idea what it looks like:

str(G1OUT_gwFERN) 

$ SE13 :List of 3
  ..$ d20:List of 5
  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
  .. .. ..$ SWC : num [1:8760] 46 45.9 46 45.9 45.9 ...
  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
  .. .. ..$ SWC : num [1:8760] 49.8 49.8 49.8 49.8 49.8 ...
  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
  .. .. ..$ SWC : num [1:8784] 48.2 48.2 48.1 48.1 48.1 ...
  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  ..$ d50:List of 5
  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
  .. .. ..$ SWC : num [1:8760] 35.2 35.2 35.2 35.2 35.2 ...
  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
  .. .. ..$ SWC : num [1:8760] 34.8 34.8 34.7 34.7 34.8 ...
  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
  .. .. ..$ SWC : num [1:8784] 34.2 34.2 34.1 34.1 34.1 ...
  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
  .. .. ..$ SWC : num [1:8760] 36.4 36.4 36.3 36.3 36.3 ...
  ..$ d5 :List of 5
  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
  .. .. ..$ SWC : num [1:8760] 32.5 32.4 32.4 32.4 32.4 ...
  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
  .. .. ..$ SWC : num [1:8760] 32.1 32.1 32.1 32.1 32.1 ...
  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
  .. .. ..$ SWC : num [1:8784] 30.3 30.3 30.3 30.2 30.2 ...
  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
  .. .. ..$ SWC : num [1:8760] 31.1 31.2 31.1 31.1 31.1 ...
 $ SE14 :List of 3
  ..$ d20:List of 5
  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
  .. .. ..$ SWC : num [1:8760] 52.5 52.5 52.5 52.5 52.4 ...
  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
  .. .. ..$ SWC : num [1:8760] 53.7 53.7 53.7 53.7 53.7 ...
  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
  .. .. ..$ SWC : num [1:8784] 52.3 52.2 52.3 52.3 52.2 ...
  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
  .. .. ..$ SWC : num [1:8760] 55 55 55 55.1 55 ...
  ..$ d50:List of 5
  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
  .. .. ..$ SWC : num [1:8760] 27.9 27.9 27.9 27.9 27.9 ...
  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
  .. .. ..$ SWC : num [1:8760] 28.5 28.5 28.5 28.5 28.5 ...
  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
  .. .. ..$ SWC : num [1:8784] 26.7 26.7 26.7 26.6 26.7 ...
  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
  .. .. ..$ SWC : num [1:8760] 29.4 29.4 29.4 29.4 29.5 ...
  ..$ d5 :List of 5
  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2014-01-01" "2014-01-01" "2014-01-01" ...
  .. .. ..$ SWC : num [1:8760] 39.8 39.8 39.7 39.6 39.7 ...
  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2015-01-01" "2015-01-01" "2015-01-01" ...
  .. .. ..$ SWC : num [1:8760] 42.2 42.3 42.3 42.3 42.3 ...
  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8784], format: "2016-01-01" "2016-01-01" "2016-01-01" ...
  .. .. ..$ SWC : num [1:8784] 36.6 36.6 36.5 36.6 36.5 ...
  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2017-01-01" "2017-01-01" "2017-01-01" ...
  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:
  .. .. ..$ Date: Date[1:8760], format: "2018-01-01" "2018-01-01" "2018-01-01" ...
  .. .. ..$ SWC : num [1:8760] 56.5 56.5 56.5 56.5 56.3 ...

I extracted a part of one of the dataframes from the list and used dput() so you have some toy data to work with:

    toydat <-  structure(list(Date = structure(c(16277, 16277, 16277, 16277, 
16277, 16277, 16277, 16277, 16277, 16277, 16277, 16277, 16277, 
16277, 16277, 16277, 16277, 16277, 16277, 16278, 16278, 16278, 
16278, 16278, 16278, 16278, 16278, 16278, 16278, 16278, 16278, 
16278, 16278, 16278, 16278, 16278, 16278, 16278, 16278, 16278, 
16278, 16278, 16278, 16279, 16279, 16279, 16279, 16279, 16279, 
16279, 16279, 16279, 16279, 16279, 16279, 16279, 16279, 16279, 
16279, 16279, 16279, 16279, 16279, 16279, 16279, 16279, 16279, 
16280, 16280, 16280, 16280, 16280, 16280, 16280, 16280, 16280, 
16280, 16280, 16280, 16280, 16280, 16280, 16280, 16280, 16280, 
16280, 16280, 16280, 16280, 16280, 16280, 16281, 16281, 16281, 
16281, 16281, 16281, 16281, 16281, 16281, 16281), class = "Date"), 
    SWC = c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
    NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 19.627243, 19.543659, 
    19.593796, 19.534379, 19.59937, 19.51582, 19.482441, 19.51582, 
    19.571497, 19.645825, 20.83435, 21.116572, 22.688702, 22.216629, 
    21.54243, 21.229946, 21.003335, 20.833735, 20.74902, 20.608045, 
    20.512311, 20.411049)), row.names = 48774:48874, class = "data.frame")

The measurements were taken hourly, so I have 24 measurements for one day. Some values in the dataframes are NoData values though, so I want to fill these gaps using linear interpolation. However, I only want to use linear interpolation if the gap between the NoData values and actual values is not bigger than 2 days. In regard to the toy data this means that if the values are missing for the 28th and 29th of July (2014-07-28 and 2014-07-29) I only want to fill the gaps for these days and not for the 27th, 26th, 25th... July and so on. If the gaps are larger than 2 days I want to keep the NoData values as I'll use linear regression later on to fill these gaps but that should not be topic of this post.

I already tried the following things:

I used the na.approx() function from the zoo package. I typed:

na.approx(toydat$SWC, na.rm = FALSE)

but that just returns the data as it was before and does not interpolate (I typed $SWC because I only want to interpolate that column). I figured that if I add rule = 2 to the code that it takes the last value after the NaN values and just puts that value for all of the NaN values which is not what I want. I also tried using maxgap = 48 because I thought that would make sure that only 48 values get interpolated. However, as I couldn't manage to interpolate properly anyway, nothing happened.

I would really appreciate some help.

Phil
  • 85
  • 7

2 Answers2

0

Here's a messy one using Base R that will work on na values at the start of the data.frame and extrapolate out for na values at the end of a data.frame, it does assume the frequency is the same. Note: df_list is a proxy for your list

# Linear interpolation function handling ties,
# returns interpolated vector the same length 
# a the input vector: -> vector
l_interp_vec <- function(na_vec){
  approx(x = na_vec, method = "linear", ties = "constant", n = length(na_vec))$y
}

# Applied to a dataframe, replacing NA values
# in each of the numeric vectors,
# with interpolated values.
# input is dataframe: -> dataframe()
interped_df <- function(df){
  data.frame(lapply(df, function(x) {
  if (is.numeric(x)) {
    # Store a scalar of min row where x isn't NA: -> min_non_na
    min_non_na <- min(which(!(is.na(x))))
    # Store a scalar of max row where x isn't NA: -> max_non_na
    max_non_na <- max(which(!(is.na(x))))
    # Store scalar of the number of rows needed to impute prior
    # to first NA value: -> ru_lower
    ru_lower <- ifelse(min_non_na > 1, min_non_na - 1, min_non_na)
    # Store scalar of the number of rows needed to impute after
    # the last non-NA value: -> ru_upper
    ru_upper <- ifelse(max_non_na == length(x),
                       length(x) - 1,
                       (length(x) - (max_non_na + 1)))

    # Store a vector of the ramp to function: -> l_ramp_up:
    ramp_up <-
      as.numeric(cumsum(rep(x[min_non_na] / (min_non_na), ru_lower)))

    # Apply the interpolation function on vector "x": -> y
    y <-
      as.numeric(l_interp_vec(as.numeric(x[min_non_na:max_non_na])))

    # Create a vector that combines the ramp_up vector
    # and y if the first NA is at row 1: -> z
    if (length(ramp_up) > 1 & max_non_na != length(x)) {
      # Create a vector interpolations if there are
      # multiple NA values after the last value: -> lower_l_int
      lower_l_int <-
        as.numeric(cumsum(rep(mean(diff(
          c(ramp_up, y)
        )),
        ru_upper + 1)) +
          as.numeric(x[max_non_na]))

      # Store the linear interpolations in  a vector: -> z
      z <- as.numeric(c(ramp_up, y, lower_l_int))

    } else if (length(ramp_up) > 1 & max_non_na == length(x)) {
      # Store the linear interpolations in  a vector: -> z
      z <- as.numeric(c(ramp_up, y))

    } else if (min_non_na == 1 & max_non_na != length(x)) {
      # Create a vector interpolations if there are
      # multiple NA values after the last value: -> lower_l_int
      lower_l_int <-
        as.numeric(cumsum(rep(mean(diff(
          c(ramp_up, y)
        )),
        ru_upper + 1)) +
          as.numeric(x[max_non_na]))

      # Store the linear interpolations in  a vector: -> z
      z <- as.numeric(c(y, lower_l_int))

    } else{
      # Store the linear interpolations in  a vector: -> z
      z <- as.numeric(y)

    }

    # Interpolate between points in x, return new x:
    return(as.numeric(ifelse(is.na(x), z, x)))

  } else{
    x

  }

}))}

df_list_interped_extrapped <- lapply(df_list, interped_df)
hello_friend
  • 5,682
  • 1
  • 11
  • 15
  • 1
    @Phil Hi Phils please let me know if this answer is what you are after or not happy to change it if need be, otherwise please upvote and accept it if it does what you require. – hello_friend Apr 28 '20 at 15:40
  • Unfortunately I can not check if it works today anymore but I will do it right away tomorrow morning. I'll get back to you then, thanks already! – Phil Apr 28 '20 at 15:54
0

I just realized that na.approx() did not interpolate any values because there are no values before the NoData values. Obviously, interpolation does not work then as it needs values before and after the NoData values in order to interpolate.

Phil
  • 85
  • 7
  • that is correct. But we can still use a linear function that will be consistent with interpolations in the relevant range. Please see my response below for a solution to your problem. – hello_friend Apr 30 '20 at 00:41