Sequentially count same groups in one column in R

Question

I have data frame with several columns I need to regroup sequence of col2 in order that after label change from a to b or b to a they grouped by with new label which you can see the result in Desired column

testdf <- data.frame(mydate = seq(as.Date('2012-01-01'), 
                                  as.Date('2012-01-10'), by = 'day'),
                     col1 = 1:10,
                     col2 = c("a","a","b","b","a","b","a","b","a","a"),
                     Desired= c(1,1,2,2,3,4,5,6,7,7))

       mydate col1 col2 Desired
1  2012-01-01    1    a       1
2  2012-01-02    2    a       1
3  2012-01-03    3    b       2
4  2012-01-04    4    b       2
5  2012-01-05    5    a       3
6  2012-01-06    6    b       4
7  2012-01-07    7    a       5
8  2012-01-08    8    b       6
9  2012-01-09    9    a       7
10 2012-01-10   10    a       7

Are there any ways to solve this problem without FOR loops. because the dataset has more than 1 million rows.

I think this is a duplicate question, but here's one way: `r <- rle(as.character(testdf$col2)); r$values <- seq_along(r$values); inverse.rle(r)` There is also a nice function for this `rleid` in the `data.table` package. — Frank, Jul 09 '15 at 16:28
General advice: with that many records, you should consider using data tables instead of dataframes (for code elegance and computational efficiency), [see this](http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf) — shekeine, Jul 09 '15 at 16:35
As @Frank mentioned, just `library(data.table) ; rleid(testdf$col2)` should do (with the devel version) — David Arenburg, Jul 09 '15 at 18:49

RHertel · Answer 1 · 2015-07-09T16:46:32.300

1

You could try this:

output <- c(0,cumsum(diff(as.numeric(testdf$col2))!=0))+1
#> output
#[1] 1 1 2 2 3 4 5 6 7 7

edited Jul 09 '15 at 16:46

answered Jul 09 '15 at 16:36

RHertel

23,412
5
38
64

score 1 · Answer 2 · answered Jul 09 '15 at 16:43

1

This is a more in vogue way of doing this.

testdf %>% group_by(col2) %>% mutate(first = cumsum(as.numeric(col2))

answered Jul 09 '15 at 16:43

daniel

1,186
2
12
21

It may be "en vogue", but are you sure that this produces the desired output? If I remove the target column with `testdf <- testdf[,-4]` and use, according to your command sequence, `p <- testdf %>% group_by(col2) %>% mutate(first = cumsum(as.numeric(col2)))`, then this yield on my computer a result for `p` that does not resemble much the desired output. – RHertel Jul 09 '15 at 17:05

Sequentially count same groups in one column in R

2 Answers2