Create group number for contiguous runs of equal values

Question

Is there is a faster way to make a counter index than using a loop? For each contiguous run of equal values, the index should be the same. I find the looping very slow especially when the data is so big.

For illustration, here is the input and desired output

x <- c(2, 3, 9, 2, 4, 4, 3, 4, 4, 5, 5, 5, 1)

Desired resulting counter:

c(1, 2, 3, 4, 5, 5, 6, 7, 7, 8, 8, 8, 9)

Note that non-contiguous runs have different indexes. E.g. see the desired indexes of the values 2 and 4

My inefficient code is this:

group[1]<-1
counter<-1
for (i in 2:n){
if (x[i]==x[i-1]){
    group[i]<-counter
}else{
    counter<-counter+1
    group[1]<-counter}
}

score 13 · Answer 1 · edited Jan 23 '17 at 17:30

13

Using data.table, which has the function rleid():

require(data.table) # v1.9.5+
rleid(x)
#  [1] 1 2 3 4 5 5 6 7 7 8 8 8 9

edited Jan 23 '17 at 17:30

Frank

66,179
8
96
180

answered May 19 '15 at 00:21

Arun

116,683
26
284
387

score 12 · Accepted Answer · answered May 19 '15 at 00:18

12

If you have numeric values like this, you can use diff and cumsum to add up changes in values

x <- c(2,3,9,2,4,4,3,4,4,5,5,5,1)
cumsum(c(1,diff(x)!=0))
# [1] 1 2 3 4 5 5 6 7 7 8 8 8 9

answered May 19 '15 at 00:18

MrFlick

195,160
17
277
295

Definitely faster than my answer. Can't assess against Arun's `data.table` answer at the moment. – Jota May 19 '15 at 01:00
Thanks Frank for that comment. I will now use MrFlick's suggestion. It seems that some installation is needed for the data.table suggestion of Arun. – Rens May 19 '15 at 01:09
1

Yep, if you want to try Arun's solution see this link for help on installation: https://github.com/Rdatatable/data.table/wiki/Installation – Jota May 19 '15 at 01:13
2

@Frank, pushed an even faster version of `rleid()` yesterday which is also memory efficient. Here, `diff(x)`, `c(...)`, `!=` and `cumsum()` each allocate new memory, meaning it requires ~4x the original data in space!! – Arun May 29 '15 at 11:49

Jota · Answer 3 · 2015-05-19T01:16:10.713

6

This will work with numeric of character values:

rep(1:length(rle(x)$values), times = rle(x)$lengths)
#[1] 1 2 3 4 5 5 6 7 7 8 8 8 9

You can also be a bit more efficient by calling rle just once (about 2x faster) and a very slight speed improvement can be made using rep.int instead of rep:

y <- rle(x)
rep.int(1:length(y$values), times = y$lengths)

edited May 19 '15 at 01:16

answered May 19 '15 at 00:27

Jota

17,281
7
63
93

score 2 · Answer 4 · answered Mar 12 '21 at 11:53

2

Above answer by Jota can be further simplified to, which will be even faster

with(rle(x), rep(1:length(lengths), lengths))

 [1] 1 2 3 4 5 5 6 7 7 8 8 8 9

answered Mar 12 '21 at 11:53

AnilGoyal

25,297
4
27
45

Maël · Answer 5 · 2023-07-18T11:12:59.323

1

With dplyr, you can use consecutive_id:

library(dplyr) #1.1.0+
consecutive_id(x)
# [1] 1 2 3 4 5 5 6 7 7 8 8 8 9

edited Jul 18 '23 at 11:12

answered Mar 06 '23 at 17:35

Maël

45,206
3
29
67

Create group number for contiguous runs of equal values

5 Answers5

Linked

Related