Recode a value in a vector based on surrounding values

Question

I'm trying to programmatically change a variable from a 0 to a 1 if there are three 1s before and after a 0.

For example, if the number in a vector were 1, 1, 1, 0, 1, 1, and 1, then I want to change the 0 to a 1.

Here is data in the vector dummy_code in the data.frame df:

original_df <- data.frame(dummy_code = c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1))

Here is how I'm trying to have the values be recoded:

desired_df <- data.frame(dummy_code = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1)

I tried to use the function fill in the package tidyr, but this fills in missing values, so it won't work. If I were to recode the 0 values to be missing, then that would not work either, because it would simply code every NA as 1, when I would only want to code every NA surrounded by three 1s as 1.

Is there a way to do this in an efficient way programmatically?

Possible duplicate of http://stackoverflow.com/questions/23840590/how-to-fill-in-the-succeeding-numbers-whenever-there-is-a-0-in-r — akrun, Feb 08 '17 at 13:13

G. Grothendieck · Answer 1 · 2017-01-29T20:51:37.903

3

Here is a one-liner using rollapply from zoo:

library(zoo)

rollapply(c(0, 0, 0, x, 0, 0, 0), 7, function(x) if (all(x[-4] == 1)) 1 else x[4])
##  [1] 1 0 0 1 1 1 1 1 1 1 0 0 1

Note: Input used was:

x <- c(1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1)

edited Jan 29 '17 at 20:51

answered Jan 29 '17 at 19:53

G. Grothendieck

254,981
17
203
341

Is `x` in `c(0, 0, 0, x, 0, 0, 0)` referring to input? If so, can we maybe change to disambiguate? Thanks – Joshua Rosenberg Jan 29 '17 at 20:12
The answer does say "The input used was: `x <-` " – G. Grothendieck Jan 29 '17 at 20:50
Yes, but not little confusing as also an `x` in lambda function? Thanks again – Joshua Rosenberg Jan 29 '17 at 20:52

Henrik · Accepted Answer · 2017-01-29T22:39:37.520

3

An rle alternative, using the x from @G. Grothendieck's answer:

r <- rle(x)

Find indexes of runs of three 1:

i1 <- which(r$lengths == 3 & r$values == 1)

Check which of the "1 indexes" that surround a 0, and get the indexes of the 0 to be replaced:

i2 <- i1[which(diff(i1) == 2)] + 1

Replace relevant 0 with 1:

r$values[i2] <- 1

Reverse the rle operation on the updated runs:

inverse.rle(r)
# [1] 1 0 0 1 1 1 1 1 1 1 0 0 1

A similar solution based on data.table::rleid, slightly more compact and perhaps easier to read:

library(data.table)
d <- data.table(x)

Calculate length of each run:

d[ , n := .N, by = rleid(x)]

For "x" which are zero and the preceeding and subsequent runs of 1 are of length 3, set "x" to 1:

d[x == 0 & shift(n) == 3 & shift(n, type = "lead") == 3, x := 1]
d$x
# [1] 1 0 0 1 1 1 1 1 1 1 0 0 1

edited Jan 29 '17 at 22:39

answered Jan 29 '17 at 20:31

Henrik

65,555
14
143
159

sorry to change the game halfway through, but could either be modified for runs of `3` or greater? – Joshua Rosenberg Jan 29 '17 at 21:47
1

Yes, I assume you just need to change `== 3` to whatever condition you need, e.g. `>= 3`. Try! ;) – Henrik Jan 29 '17 at 21:56
For the `data.table` example, this worked: `d[dummy_code == 0 & shift(n) >= 3 & shift(n, type = "lead") >= 3, dummy_code := 1]$dummy_code`. was wondering whether another value (i.e., `:= 1`) needed to be changed, but now see that's for what is recoded. – Joshua Rosenberg Jan 29 '17 at 22:11
was wondering whether another value (i.e., `:= 1`) needed to be changed, but now see that's for what is recoded. – Joshua Rosenberg Jan 29 '17 at 22:11
@JoshuaRosenberg I was a bit lazy when I borrowed the `x` from the other answer - less typing...Because you already have a `data.frame`, you can replace `d <- data.table(x)` with `setDT(original_df)`. Also note that the `x` is updated by reference (`x := 1`). I did the `$x` just to print the resulting vector. – Henrik Jan 29 '17 at 22:45

Recode a value in a vector based on surrounding values

2 Answers2