1

I have a dataframe "data" with a grouping variable "grp" and a binary classification variable "classif". For each group in grp, I want to create a "result" variable creating an index of separate blocks of 0 in the classif variable. For the time being, I don't know how to reset the count for each level of the grouping variable and I don't find a way to only create the index for blocks of 0s (ignoring the 1s).

Example data:

grp <- c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3)
classif <- c(0,1,0,0,1,0,0,1,1,0,0,0,0,1,0,1,1,1,0,0,1,1,0,0,0,1,0,1,0)
result <- c(1,0,2,2,0,3,3,0,0,1,1,1,1,0,2,0,0,0,3,3,0,0,1,1,1,0,2,0,3)
wrong_result <- c(1,2,3,3,4,5,5,1,1,2,2,2,2,3,4,5,5,5,6,6,1,1,2,2,2,3,4,5,6)
Data <- data.frame(grp,classif,result, wrong_result)

I have tried using rleid but the following command produces "wrong_result", which is not what I'm after.

data[, wrong_result:= rleid(classif)]
data[, wrong_result:= rleid(classif), by=grp]
  • 1
    [Create counter of consecutive runs of a certain value](https://stackoverflow.com/questions/27077228/create-counter-of-consecutive-runs-of-a-certain-value), but by group. – Henrik Feb 12 '23 at 14:22
  • E.g. `ave(d$classif, d$grp, FUN = \(x) with(rle(x==0), rep(cumsum(values)*values, lengths)))` – Henrik Feb 12 '23 at 14:54

2 Answers2

1

With dplyr, use cumsum() and lag() to find blocks of zeroes .by group. (Make sure you’re using the latest version of dplyr to use the .by argument).

library(dplyr)

Data %>%
  mutate(
    result2 = ifelse(
      classif == 0,
      cumsum(classif == 0 & lag(classif, default = 1) == 1),
      0
    ),
    .by = grp
  )
   grp classif result result2
1    1       0      1       1
2    1       1      0       0
3    1       0      2       2
4    1       0      2       2
5    1       1      0       0
6    1       0      3       3
7    1       0      3       3
8    2       1      0       0
9    2       1      0       0
10   2       0      1       1
11   2       0      1       1
12   2       0      1       1
13   2       0      1       1
14   2       1      0       0
15   2       0      2       2
16   2       1      0       0
17   2       1      0       0
18   2       1      0       0
19   2       0      3       3
20   2       0      3       3
21   3       1      0       0
22   3       1      0       0
23   3       0      1       1
24   3       0      1       1
25   3       0      1       1
26   3       1      0       0
27   3       0      2       2
28   3       1      0       0
29   3       0      3       3
zephryl
  • 14,633
  • 3
  • 11
  • 30
  • Simple and efficient, thanks ! I used group_by() as .by() doesn't seem to work. Might have to update the package ! I was wondering if the same could be done within the data.table framework - I will look at it and post my findings here. – user1969717 Feb 13 '23 at 10:53
1

Use rle and sequentially number the runs produced and then convert back and zero out the runs of 1's. No packages are used.

seq0 <- function(x) {
  r <- rle(x)
  is0 <- r$values == 0
  r$values[is0] <- seq_len(sum(is0))
  inverse.rle(r) * !x
}
transform(Data, result2 = ave(classif, grp, FUN = seq0))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341