1

I'm trying to return the most recent row in the vector with a non-missing value. For instance, given

x <- c(1,2,NA,NA,3,NA,4)

Then function(x) would output a list like:

c(1,2,2,2,3,3,4)

Very simple question, but running it with loops or brute force on multiple columns takes forever.

Rubens
  • 14,478
  • 11
  • 63
  • 92
canary_in_the_data_mine
  • 2,193
  • 2
  • 24
  • 28
  • 2
    possible duplicate of [How to copy a value in a vector to next position(s) in vector](http://stackoverflow.com/questions/17320312/how-to-copy-a-value-in-a-vector-to-next-positions-in-vector) – eddi Jul 11 '13 at 21:46

2 Answers2

5

You can use zoo::na.locf for that

require(zoo)
x <- c(1, 2, NA, NA, 3, NA, 4)
na.locf(x)
## [1] 1 2 2 2 3 3 4
dickoa
  • 18,217
  • 3
  • 36
  • 50
2

You can do this using the Reduce function:

> x <- c(1,2,NA,NA,3,NA,4)
> locf <- function(x,y) if(is.na(y)) x else y
> Reduce( locf, x, accumulate=TRUE )
[1] 1 2 2 2 3 3 4

This way you do not need to load an extra package (and it could be customized to different types of objects if needed).

The Reduce option is quicker than zoo::na.locf for the sample vector on my computer:

> library(zoo)
> library(microbenchmark)
> 
> microbenchmark( 
+ Reduce( locf, x, accumulate=TRUE ),
+ na.locf(x)
+ )
Unit: microseconds
                               expr     min       lq  median       uq     max
 Reduce(locf, x, accumulate = TRUE)  22.169  24.0160  27.506  29.3530 112.073
                         na.locf(x) 149.841 151.8945 154.357 169.5465 377.271
 neval
   100
   100

Though there may be other situations where na.locf will be faster. I was actually surprised at the amount of difference.


Benchmarking on bigger data shows the difference clearly between na.locf from zoo package and using Reduce:

x <- sample(c(1:5, NA), 1e6, TRUE)
require(zoo)
require(microbenchmark)
locf <- function(x,y) if(is.na(y)) x else y

microbenchmark(Reduce( locf, x, accumulate=TRUE ), na.locf(x), times=10)
Unit: milliseconds
                              expr       min        lq    median       uq      max neval
Reduce(locf, x, accumulate = TRUE) 5480.4796 5958.0905 6605.3547 7458.404 7915.046    10
                        na.locf(x)  661.2886  911.1734  950.2542 1026.348 1095.642    10
Arun
  • 116,683
  • 26
  • 284
  • 387
Greg Snow
  • 48,497
  • 6
  • 83
  • 110
  • GregSnow, you can see the difference on bigger data. I've edited your post with the benchmarking results. In general, I've the opinion that `Reduce` does not scale well. – Arun Jul 13 '13 at 13:49
  • 1
    @Arun, It would be interesting to see where they switch places and how much that depends on other factors. – Greg Snow Jul 15 '13 at 16:51
  • yes definitely. I'll try to see if I can test on some functions (from answers here on SO with Reduce) and write-up here on SO. – Arun Jul 15 '13 at 17:38