2

I'm trying to programmatically change a variable from a 0 to a 1 if there are three 1s before and after a 0.

For example, if the number in a vector were 1, 1, 1, 0, 1, 1, and 1, then I want to change the 0 to a 1.

Here is data in the vector dummy_code in the data.frame df:

original_df <- data.frame(dummy_code = c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1))

Here is how I'm trying to have the values be recoded:

desired_df <- data.frame(dummy_code = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1)

I tried to use the function fill in the package tidyr, but this fills in missing values, so it won't work. If I were to recode the 0 values to be missing, then that would not work either, because it would simply code every NA as 1, when I would only want to code every NA surrounded by three 1s as 1.

Is there a way to do this in an efficient way programmatically?

Joshua Rosenberg
  • 4,014
  • 9
  • 34
  • 73
  • Possible duplicate of http://stackoverflow.com/questions/23840590/how-to-fill-in-the-succeeding-numbers-whenever-there-is-a-0-in-r – akrun Feb 08 '17 at 13:13

2 Answers2

3

Here is a one-liner using rollapply from zoo:

library(zoo)

rollapply(c(0, 0, 0, x, 0, 0, 0), 7, function(x) if (all(x[-4] == 1)) 1 else x[4])
##  [1] 1 0 0 1 1 1 1 1 1 1 0 0 1

Note: Input used was:

x <- c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
3

An rle alternative, using the x from @G. Grothendieck's answer:

r <- rle(x)

Find indexes of runs of three 1:

i1 <- which(r$lengths == 3 & r$values == 1)

Check which of the "1 indexes" that surround a 0, and get the indexes of the 0 to be replaced:

i2 <- i1[which(diff(i1) == 2)] + 1

Replace relevant 0 with 1:

r$values[i2] <- 1

Reverse the rle operation on the updated runs:

inverse.rle(r)
# [1] 1 0 0 1 1 1 1 1 1 1 0 0 1

A similar solution based on data.table::rleid, slightly more compact and perhaps easier to read:

library(data.table)
d <- data.table(x)

Calculate length of each run:

d[ , n := .N, by = rleid(x)]

For "x" which are zero and the preceeding and subsequent runs of 1 are of length 3, set "x" to 1:

d[x == 0 & shift(n) == 3 & shift(n, type = "lead") == 3, x := 1]
d$x
# [1] 1 0 0 1 1 1 1 1 1 1 0 0 1 
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • sorry to change the game halfway through, but could either be modified for runs of `3` or greater? – Joshua Rosenberg Jan 29 '17 at 21:47
  • 1
    Yes, I assume you just need to change `== 3` to whatever condition you need, e.g. `>= 3`. Try! ;) – Henrik Jan 29 '17 at 21:56
  • For the `data.table` example, this worked: `d[dummy_code == 0 & shift(n) >= 3 & shift(n, type = "lead") >= 3, dummy_code := 1]$dummy_code`. was wondering whether another value (i.e., `:= 1`) needed to be changed, but now see that's for what is recoded. – Joshua Rosenberg Jan 29 '17 at 22:11
  • was wondering whether another value (i.e., `:= 1`) needed to be changed, but now see that's for what is recoded. – Joshua Rosenberg Jan 29 '17 at 22:11
  • @JoshuaRosenberg I was a bit lazy when I borrowed the `x` from the other answer - less typing...Because you already have a `data.frame`, you can replace `d <- data.table(x)` with `setDT(original_df)`. Also note that the `x` is updated by reference (`x := 1`). I did the `$x` just to print the resulting vector. – Henrik Jan 29 '17 at 22:45