4

What's a clever (i.e., not a loop) way to get the length of each spell of missing values in a vector? My ideal output is a vector that is the same length, in which each missing value is replaced by the length of the spell of missing values of which it was a part, and all other values are 0's.

So, for input like:

x <- c(2,6,1,2,NA,NA,NA,3,4,NA,NA)

I'd like output like:

y <- c(0,0,0,0,3,3,3,0,0,2,2)
smci
  • 32,567
  • 20
  • 113
  • 146
daanoo
  • 771
  • 5
  • 18

3 Answers3

10

One simple option using rle:

m <- rle(is.na(x))
> rep(ifelse(m$values,m$lengths,0),times = m$lengths)
[1] 0 0 0 0 3 3 3 0 0 2 2
joran
  • 169,992
  • 32
  • 429
  • 468
1

I was independently working on something using rle() and either cumsum() or dplyr group_by() and n() to get group-lengths of NAs:

> x2 <- as.numeric(is.na(x))
  0 0 0 0 1 1 1 0 0 1 1

> rle(x2)
Run Length Encoding
  lengths: int [1:4] 4 3 2 2
  values : num [1:4] 0 1 0 1

# Now we can assign group-numbers...
> cumsum(c(diff(x2)==+1,0)) * x2
  0 0 0 0 1 1 1 0 0 2 2
# ...then get group-lengths from counting those...
> rle(cumsum(c(diff(x2)==+1,0)) * x2)
Run Length Encoding
  lengths: int [1:4] 4 3 2 2
  values : num [1:4] 0 1 0 2

We could kludge something, but it won't be as compact and elegant as @joran's solution.

smci
  • 32,567
  • 20
  • 113
  • 146
1

Here is another option with rleid and ave

library(data.table)
ave(x, rleid(is.na(x)), FUN = length)*is.na(x)
#[1] 0 0 0 0 3 3 3 0 0 2 2
akrun
  • 874,273
  • 37
  • 540
  • 662