Find most recent non-missing value in a vector

Question

I'm trying to return the most recent row in the vector with a non-missing value. For instance, given

x <- c(1,2,NA,NA,3,NA,4)

Then function(x) would output a list like:

c(1,2,2,2,3,3,4)

Very simple question, but running it with loops or brute force on multiple columns takes forever.

possible duplicate of [How to copy a value in a vector to next position(s) in vector](http://stackoverflow.com/questions/17320312/how-to-copy-a-value-in-a-vector-to-next-positions-in-vector) — eddi, Jul 11 '13 at 21:46

score 5 · Accepted Answer · answered Jul 11 '13 at 21:43

5

You can use zoo::na.locf for that

require(zoo)
x <- c(1, 2, NA, NA, 3, NA, 4)
na.locf(x)
## [1] 1 2 2 2 3 3 4

answered Jul 11 '13 at 21:43

dickoa

18,217
3
36
50

Perfect, thanks. [Will accept after the 15 minute interval unless a better answer appears.] – canary_in_the_data_mine Jul 11 '13 at 21:48

score 2 · Answer 2 · edited Jul 13 '13 at 13:49

You can do this using the Reduce function:

> x <- c(1,2,NA,NA,3,NA,4)
> locf <- function(x,y) if(is.na(y)) x else y
> Reduce( locf, x, accumulate=TRUE )
[1] 1 2 2 2 3 3 4

This way you do not need to load an extra package (and it could be customized to different types of objects if needed).

The Reduce option is quicker than zoo::na.locf for the sample vector on my computer:

> library(zoo)
> library(microbenchmark)
> 
> microbenchmark( 
+ Reduce( locf, x, accumulate=TRUE ),
+ na.locf(x)
+ )
Unit: microseconds
                               expr     min       lq  median       uq     max
 Reduce(locf, x, accumulate = TRUE)  22.169  24.0160  27.506  29.3530 112.073
                         na.locf(x) 149.841 151.8945 154.357 169.5465 377.271
 neval
   100
   100

Though there may be other situations where na.locf will be faster. I was actually surprised at the amount of difference.

Benchmarking on bigger data shows the difference clearly between na.locf from zoo package and using Reduce:

x <- sample(c(1:5, NA), 1e6, TRUE)
require(zoo)
require(microbenchmark)
locf <- function(x,y) if(is.na(y)) x else y

microbenchmark(Reduce( locf, x, accumulate=TRUE ), na.locf(x), times=10)
Unit: milliseconds
                              expr       min        lq    median       uq      max neval
Reduce(locf, x, accumulate = TRUE) 5480.4796 5958.0905 6605.3547 7458.404 7915.046    10
                        na.locf(x)  661.2886  911.1734  950.2542 1026.348 1095.642    10

GregSnow, you can see the difference on bigger data. I've edited your post with the benchmarking results. In general, I've the opinion that `Reduce` does not scale well. — Arun, Jul 13 '13 at 13:49
@Arun, It would be interesting to see where they switch places and how much that depends on other factors. — Greg Snow, Jul 15 '13 at 16:51
yes definitely. I'll try to see if I can test on some functions (from answers here on SO with Reduce) and write-up here on SO. — Arun, Jul 15 '13 at 17:38

Find most recent non-missing value in a vector

2 Answers2