0

I use R and R package dplyr. I want in data frame fill missing values from previous row value (increase it by 1). It works well if I have a gap of one, but if there are 2 or more consecutive missing values then I have a problem. Let say here I have a data frame and for some rows it values are missing.

  id val1 val2
1  0    0    0
2  1    1    1
3  2   NA    NA
4  3    3    3
5  4   NA   NA
6  5   NA   NA
7  6    6    6

Here val2 is for which I perform the increase and val1 is just original value. For NA values I want to take previous row value and increase it by 1. I use dplyr package and lag function.
Here is the code:

library(dplyr)
tmp.data <- data.frame(id = c(0:6),val1 = c(0:6)) %>%
  mutate(
    val1 = ifelse( val1 %in% c(2,4:5),NA,val1),
    val2 = val1,
  ) 

tmp.data <- tmp.data %>%
  mutate(
    val2 = ifelse(
      is.na(val2),
      lag(val2) + 1,
      val2
      )
  )
K.I.
  • 759
  • 1
  • 11
  • 30

1 Answers1

1

Solution 1a: Assuming increase from previous missing value(s), with 2 consecutive missing rows

Using dplyr:

tmp.data %>%
  mutate(val2 = ifelse(is.na(val2), lag(val2) + 1, val2),
         val2 = ifelse(is.na(val2), lag(val2) + 1, val2))

Solution 1b: Assuming increase from previous missing value(s), with N consecutive missing rows

Using data.table, zoo and dplyr:

setDT(tmp.data)[, consec := seq_len(.N), by=rleid(val2)]

tmp.data %>%
  mutate(val2 = ifelse(is.na(val2), na.locf(val2) + consec, val2)) %>%
  select(-consec)

Or written together:

tmp.data %>%
  group_by(rleid(val2)) %>%
  mutate(consec = seq_along(val2)) %>%
  ungroup() %>%
  mutate(val2 = ifelse(is.na(val2), na.locf(val2) + consec, val2)) %>%
  select(id, val1, val2)

Solution 2: Not assuming increase from previous missing value(s)

Using dplyr and zoo:

tmp.data %>%
  mutate(val2 = ifelse(is.na(val2), na.locf(val2) + 1, val2))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67