51

In R, what would be the most efficient/simplest way to count runs of identical elements in a sequence?

For example, how to count the numbers of consecutive zeros in a sequence of non-negative integers:

x <- c(1,0,0,0,1,0,0,0,0,0,2,0,0) # should give 3,5,2
Amicable
  • 3,115
  • 3
  • 49
  • 77
andrekos
  • 2,822
  • 4
  • 27
  • 26
  • Do you want answers in R? If so, it's probably wise to start the question with "In R ..." rather than just having an R tag. – slim Oct 01 '09 at 11:40
  • Note: this doesn't work with runs of NAs or NaNs (they always get treated as non-contiguous). An ugly hack workaround would be to assign NAs and NaNs to some sentinel integer values. – smci Apr 09 '12 at 21:06

2 Answers2

78

Use rle():

y <- rle(c(1,0,0,0,1,0,0,0,0,0,2,0,0))
y$lengths[y$values==0]
Jaap
  • 81,064
  • 34
  • 182
  • 193
Rob Hyndman
  • 30,301
  • 7
  • 73
  • 85
  • And how would you plot a histogram from this data? Imagine I have numbers from 1 to 100, and squences of different lengths. And I want to create the histogram showing how often runs of some length happens happen or numbers happen or both things. – skan Mar 24 '15 at 13:35
  • 5
    This is not the place for a new question. – Rob Hyndman Mar 24 '15 at 23:28
28

This can be done in an efficient way by using indexes of where the values change:

x <- c(1,0,0,0,1,2,1,0,0,1,1)

Find where the values change:

diffs <- x[-1L] != x[-length(x)]

Get the indexes, and then get the difference in subsequent indexes:

idx <- c(which(diffs), length(x))
diff(c(0, idx))
Shane
  • 98,550
  • 35
  • 224
  • 217
  • That's essentially what rle() is doing. – Rob Hyndman Oct 01 '09 at 11:47
  • 1
    Sorry Rob. Wrote that on my iPhone earlier, and there's no "app for that". :). Please vote for Rob's answer instead of mine! – Shane Oct 01 '09 at 12:21
  • 8
    +1: While `rle()` is an easier way to answer the OP's question, this solution has other advantages for some cases. In particular, I was looking for a way to number each run uniquely rather than counting the runs and I found I could do that with `c(0,cumsum(x[-1L] != x[-length(x)]))`. – Simon Mar 08 '13 at 22:37
  • Thank you Shane, very helpful :) – Tommaso Guerrini Feb 13 '17 at 15:30