8

I have an R data frame:

a <- 1:12  
list <- c(rep("x",3),rep("y",4),rep("z",3),rep("x",2))  
data <- data.frame(a,list)

data  
 a list  
 1    x  
 2    x  
 3    x  
 4    y  
 5    y  
 6    y  
 7    y  
 8    z  
 9    z  
10    z  
11    x  
12    x

I want to create a new column which begins counting at 1 every time the value of "list" changes, i.e. in this example:

b <- c(1:3,1:4,1:3,1:2)    
data <- data.frame(a,list,b)  

I am far from being an expert in R and cannot for the life of me work out an efficient way of doing this. My main problem seems to be that any value of "list" can come back at any time, but there is no rule to the length of the blocks of one value. Does anyone have any ideas? Thanks!

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
Lucy Vanes
  • 237
  • 2
  • 7

2 Answers2

6

I would use rle() to get the run lengths of list and then use the handy sequence() function to generate the desired counter from the $lengths component returned by rle():

R> sequence(rle(as.character(data$list))$lengths)
 [1] 1 2 3 1 2 3 4 1 2 3 1 2

Notice we have to convert list to an atomic vector (character vector in my case) as a factor is not allowed in rle().

To put that into data, then wrap this in a call such as

data <- transform(data, b = sequence(rle(as.character(list))$lengths))

which gives

R> data <- transform(data, b = sequence(rle(as.character(list))$lengths))
R> data
    a list b
1   1    x 1
2   2    x 2
3   3    x 3
4   4    y 1
5   5    y 2
6   6    y 3
7   7    y 4
8   8    z 1
9   9    z 2
10 10    z 3
11 11    x 1
12 12    x 2
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • @user1777393 If you are happy with an Answer please consider accepting one of them. Use the big tick mark next to the Answer you wish to accept. The [ask] section of the [so] faq explains how to do this and why it is helpful/useful to do so. – Gavin Simpson Oct 29 '12 at 09:50
5

The key idea is to use rle() (run length encoding) on data$list (after coercing it to an atomic vector - after all, we are not interested in the specific entries). Then we use seq() to create sequences starting at 1 and ending at the calculated run length. Finally, we paste all these sequences together:

unlist(lapply(rle(as.numeric(data$list))$lengths,FUN=seq,from=1))
Stephan Kolassa
  • 7,953
  • 2
  • 28
  • 48