0

I have a dataset in R as below:

id <- c(1,1,1,1,1,2,2,2,2,3,3)
time <- c(2000,2001,2002,2003,2004,2000,2001,2002,2003,2000,2001)
group <- c(0,0,1,0,0,0,1,0,1,0,0)
df_temp <- data.frame(id, time, group)

and would like to create a new variable called "n" to record the sequence by "group" and re-start every time "group" switch from 0 to 1 or 1 to 0 as below:

n <- c(1,2,1,1,2,1,1,1,1,1,2)

Please could you suggest how I could generate variable "n" using dplyr package in R? Thanks very much, in advance.

I tried:

df_temp2 <- 
   df_temp %>%
   arrange(id, time, group) %>%
   group_by(group) %>%
   mutate(n=seq_along(group))

but "n" does not return as what I expected.

1 Answers1

1
df_temp %>%
  group_by(id,grp=cumsum(group!=lag(group,default=TRUE)))%>%
  mutate(n=row_number())%>%
  ungroup()%>%
  select(-grp)

      id  time group     n
   <dbl> <dbl> <dbl> <int>
 1     1  2000     0     1
 2     1  2001     0     2
 3     1  2002     1     1
 4     1  2003     0     1
 5     1  2004     0     2
 6     2  2000     0     1
 7     2  2001     1     1
 8     2  2002     0     1
 9     2  2003     1     1
10     3  2000     0     1
11     3  2001     0     2
one
  • 3,121
  • 1
  • 4
  • 24
  • 1
    In the latest version of dplyr, you could also use `consecutive_id` instead of the `grp` calculation with lag/cumsum, e.g.: https://stackoverflow.com/questions/33507868/is-there-a-dplyr-equivalent-to-data-tablerleid/74428002 – thelatemail Feb 02 '23 at 21:56