0

I have a dataframe with two columns (ident and value). I would like to create a counter that restart every time ident value change and also when value within each ident change. Here is an example to make it clear.

# ident value counter
#-------------------- 
#  1     0       1
#  1     0       2
#  1     1       1
#  1     1       2
#  1     1       3
#  1     0       1
#  1     1       1
#  1     1       2
#  2     1       1
#  2     0       1
#  2     0       2
#  2     0       3

I've tried the plyr package

ddply(mydf, .(ident, value), transform, .id = seq_along(ident))

Same result with the data.frame package.

nrussell
  • 18,382
  • 4
  • 47
  • 60
Demerzel
  • 15
  • 4
  • 1
    This will not handle the duplication of (1,1) group on the 7th row. It will be counted as 4,5 – OmaymaS Dec 08 '16 at 12:41

2 Answers2

2

A data.table alternative with the use of the rleid/rowid functions. With rleid you create a run length id for consecutive values, which can be used as a group. 1:.N or rowid can be used to create the counter. The code:

library(data.table)
# option 1:
setDT(d)[, counter := 1:.N, by = .(ident,rleid(value))]
# option 2:
setDT(d)[, counter := rowid(ident, rleid(value))]

which both give:

> d
    ident value counter
 1:     1     0       1
 2:     1     0       2
 3:     1     1       1
 4:     1     1       2
 5:     1     1       3
 6:     1     0       1
 7:     1     1       1
 8:     1     1       2
 9:     2     1       1
10:     2     0       1
11:     2     0       2
12:     2     0       3

With dplyr it is a bit less straightforward:

library(dplyr)
d %>% 
  group_by(ident, val.gr = cumsum(value != lag(value, default = first(value)))) %>% 
  mutate(counter = row_number()) %>% 
  ungroup() %>% 
  select(-val.gr)

As an alternative to the cumsum-function you could also use rleid from data.table.


Used data:

d <- structure(list(ident = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), 
                    value = c(0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L)), 
               .Names = c("ident", "value"), class = "data.frame", row.names = c(NA, -12L))
Jaap
  • 81,064
  • 34
  • 182
  • 193
1

We can paste the two values together and use length attribute of rle to get the length of consecutive numbers. We then use sequence to generate the counter.

df$counter <- sequence(rle(paste0(df$dent, df$value))$lengths)
df
#   dent value counter
#1     1     0       1
#2     1     0       2
#3     1     1       1
#4     1     1       2
#5     1     1       3
#6     1     0       1
#7     1     1       1
#8     1     1       2
#9     2     1       1
#10    2     0       1
#11    2     0       2
#12    2     0       3
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213