2

There are multiple ways to fill missing values in R. However, I can't find a solution for filling just the last n NAs.

Available options:

na_vector <- c(1, NA, NA, NA, 2, 3, NA, NA)

library(zoo)

na.locf(na_vector)
# Outputs: [1] 1 1 1 1 2 3 3 3

na.locf0(na_vector, maxgap = 2)
# Outputs: [1] 1 NA NA NA  2  3  3  3

How I would like it to be:

na_vector <- c(1, NA, NA, NA, 2, 3, NA, NA)

fill_na <- function(vector, n){
   ...
}

fill_na(na_vector, n = 1)
# Outputs: [1] 1 1 NA NA  2  3  3  NA

fill_na(na_vector, n = 2)
# Outputs: [1] 1 1 1 NA  2  3  3  3

3 Answers3

1

Here is an option to get those outputs using dplyr and recursion:

na_vector <- c(1, NA, NA, NA, 2, 3, NA, NA)

fill_na <- function(vector, n){
  if (n == 0) {
    vector
  } else {
    fill_na(
      vector = dplyr::coalesce(vector, dplyr::lag(vector)),
      n = n - 1
    )
  }
}

fill_na(na_vector, n = 1)
# [1]  1  1 NA NA  2  3  3 NA

fill_na(na_vector, n = 2)
# [1]  1  1  1 NA  2  3  3  3
Santiago
  • 641
  • 3
  • 14
0

Number the NA's in each consecutive run of NA's giving a and then only fill in those with a number less than or equal to n. This uses only vector operations internally and no iteration or recursion.

library(collapse)
library(zoo)

fill_na <- function(x, n) {
  a <- ave(x, groupid(is.na(x)), FUN = seq_along)
  ifelse(a <= n, na.locf0(x), x)
}

fill_na(na_vector, 1)
## [1]  1  1 NA NA  2  3  3 NA
fill_na(na_vector, 2)
## [1]  1  1  1 NA  2  3  3  3
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

Here is a solution to impute everything except the last n NA's based on base R + imputeTS.

library(imputeTS)
na_vector <- c(1, NA, NA, NA, 2, 3, NA, NA)

# The function that allows imputing everything except the last n NAs
fill_except_last_n_na <- function(x, n) {
  index <- which(rev(cumsum(rev(as.numeric(is.na(x))))) == n+1)
  x[1:tail(index,1)] <- na_locf(x[1:tail(index,1)])
  return(x)
}

# Call the new function
fill_except_last_n_na(na_vector,2)

## Result
[1]  1  1  1  1  2  3 NA NA

When you want to use another imputation option than last observation carried forward, you can just replace the na_locf with na_ma (moving average), na_interpolation (interpolation), na_kalman (Kalman Smooting on State Space Models) or other imputation function provided by the imputeTS package (see also in the imputeTS documentation for a list of functions.

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55