1

I have theoretically identical solutions, one is vectorized solution and another is with for-loop. But vectorized solution returns wrong result and I want to understand why. Solution's logic is simple: need to replace NA with previous non-NA value in the vector.

# vectorized
f1 <- function(x) {
    idx <- which(is.na(x))
    x[idx] <- x[ifelse(idx > 1, idx - 1, 1)]
    x
}

# non-vectorized
f2 <- function(x) {
    for (i in 2:length(x)) {
        if (is.na(x[i]) && !is.na(x[i - 1])) {
            x[i] <- x[i - 1]
        }
    }
    x
}

v <- c(NA,NA,1,2,3,NA,NA,6,7)
f1(v)
# [1] NA NA  1  2  3  3 NA  6  7
f2(v)
# [1] NA NA  1  2  3  3  3  6  7
Eldar Agalarov
  • 4,849
  • 4
  • 30
  • 38

2 Answers2

4

The two pieces of code are different.

  • The first one replace NA with the previous element if this one is not NA.
  • The second one replace NA with the previous element if this one is not NA, but the previous element can be the result of a previous NA substitution.

Which one is correct really depends on you. The second behaviour is more difficult to vectorize, but there are some already implemented functions like zoo::na.locf.

Or, if you only want to use base packages, you could have a look at this answer.

Community
  • 1
  • 1
digEmAll
  • 56,430
  • 9
  • 115
  • 140
3

These two solutions are not equivalent. The first function is rather like:

f2_as_f1 <- function(x) {
    y <- x # a copy of x
    for (i in 2:length(x)) {
        if (is.na(y[i])) {
            x[i] <- y[i - 1]
        }
    }
    x
}

Note the usage of the y vector.

gagolews
  • 12,836
  • 2
  • 50
  • 75