3

I would like to replace all consecutive NA values per row with zero but only if the number of consecutive NAs is less than a parmeter maxgap.

This is very similar to the function zoo::na.locf

x = c(NA,1,2,3,NA,NA,5,6,7,NA,NA,NA)
zoo::na.locf(x,  maxgap = 2, na.rm = FALSE)

gives

[1] NA 1 2 3 3 3 5 6 7 NA NA NA

There are two things different from my aim: I would like to replace the leading NA too and I would like to replace the 2 consecutive NAs with 0 and not the last non-NA value.

I would like to get

0 1 2 3 0 0 5 6 7 NA NA NA

How can I do this in R. Can I use functions from the tidyverse?

Richi W
  • 3,534
  • 4
  • 20
  • 39

3 Answers3

2

If y is the result of the na.locf line then if y[i] is not NA but x[i] is NA then it was replaced so assign 0 to it. Also if it is a leading NA which occurs when the cumsum(...) term below is 0 then replace it as well.

replace(y, (!is.na(y) & is.na(x)) | cumsum(!is.na(y)) == 0, 0)
## [1]  0  1  2  3  0  0  5  6  7 NA NA NA
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • How can we apply this elegently to a data.frame or tibble in a row by row fashion? x = c(NA,1,2,3,NA,NA,5,6,7,NA,NA,NA) y = c(NA,1,2,3,NA,NA,5,6,7,NA,NA,NA) d = data.frame( rbind(x,y)) – Richi W Feb 17 '17 at 14:25
  • If function `f` works on a single vector then: `t(apply(d, 1, f))` produces a matrix in which each row is transformed. – G. Grothendieck Feb 17 '17 at 20:07
1

We can use rle to do this

f1 <- function(vec){
  rl <- rle(is.na(vec))
  lst <- within.list(rl, {
               i1 <- seq_along(values)==1
               i2 <- seq_along(values) != length(values)
               values[!((lengths==2 & values & i2)|
                      (values & i1))] <- FALSE

             })
   vec[inverse.rle(lst)] <- 0
   vec
 }
f1(x)
#[1]  0  1  2  3  0  0  5  6  7 NA NA NA
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You could e.g. do this:

require(data.table)
require(dplyr)

x = c(NA,1,2,3,NA,NA,5,6,7,NA,NA,NA)

my_replace <- function(x, n, maxgap){
  if(is.na(x[1]) && n <= maxgap){
    x <- 0
  }
  x
}

data.frame(x, y=x) %>% 
  group_by(data.table::rleid(x)) %>% 
  mutate(x = my_replace(x, n(), 2), y = my_replace(y, n(), 1)) %>% 
  ungroup() %>% 
  select(x,y)

This allows you to set the maxgap columnwise: for x 2 for y 1.

This results in:

# A tibble: 12 × 2
       x     y
   <dbl> <dbl>
1      0     0
2      1     1
3      2     2
4      3     3
5      0    NA
6      0    NA
7      5     5
8      6     6
9      7     7
10    NA    NA
11    NA    NA
12    NA    NA
Rentrop
  • 20,979
  • 10
  • 72
  • 100