30

Suppose that my vector numbers contains c(1,2,3,5,7,8), and I wish to find if it contains 3 consecutive numbers, which in this case, are 1,2,3.

numbers = c(1,2,3,5,7,8)
difference = diff(numbers) //The difference output would be 1,1,2,2,1

To verify that there are 3 consecutive integers in my numbers vector, I've tried the following with little reward.

rep(1,2)%in%difference 

The above code works in this case, but if my difference vector = (1,2,2,2,1), it would still return TRUE even though the "1"s are not consecutive.

flodel
  • 87,577
  • 21
  • 185
  • 223
Bonnie
  • 461
  • 3
  • 11
  • 14

5 Answers5

25

Using diff and rle, something like this should work:

result <- rle(diff(numbers))
any(result$lengths>=2 & result$values==1)
# [1] TRUE

In response to the comments below, my previous answer was specifically only testing for runs of length==3 excluding longer lengths. Changing the == to >= fixes this. It also works for runs involving negative numbers:

> numbers4 <- c(-2, -1, 0, 5, 7, 8)
> result <- rle(diff(numbers4))
> any(result$lengths>=2 & result$values==1)
[1] TRUE
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • 1
    This does not seem to work properly for this case `numbers = c(-2,2,3,5,6,7,8)` – ECII Apr 20 '13 at 08:17
  • 1
    Ok, but it's usually not good form to (-1) without giving a chance to correct. – thelatemail Apr 20 '13 at 09:45
  • 2
    sorry about that. Apparently pressing down after upvoting leads to downvote and not to (what I expected) retract the upvote. Nice correction. Upvoted again. – ECII Apr 20 '13 at 11:42
  • 2
    To not grow a potentially large vector (# of unique values of `diff`), you could apply `rle` on `diff(numbers) == 1` directly. – flodel Apr 20 '13 at 11:46
  • I think you can check whether the length of the `result$values` is also 1, ensuring vectors of any size are consecutive e.g. `any(r$lengths>=2 & length(r$values)==1 & r$values==1)` – Guillermo Luque Aug 12 '20 at 12:55
15

Benchmarks!

I am including a couple functions of mine. Feel free to add yours. To qualify, you need to write a general function that tells if a vector x contains n or more consecutive numbers. I provide a unit test function below.


The contenders:

flodel.filter <- function(x, n, incr = 1L) {
  if (n > length(x)) return(FALSE)
  x <- as.integer(x)
  is.cons <- tail(x, -1L) == head(x, -1L) + incr
  any(filter(is.cons, rep(1L, n-1L), sides = 1, method = "convolution") == n-1L,
      na.rm = TRUE)
}

flodel.which <- function(x, n, incr = 1L) {
  is.cons <- tail(x, -1L) == head(x, -1L) + incr
  any(diff(c(0L, which(!is.cons), length(x))) >= n)
}

thelatemail.rle <- function(x, n, incr = 1L) {
  result <- rle(diff(x))
  any(result$lengths >= n-1L  & result$values == incr)
}

improved.rle <- function(x, n, incr = 1L) {
  result <- rle(diff(as.integer(x)) == incr)
  any(result$lengths >= n-1L  & result$values)
}

carl.seqle <- function(x, n, incr = 1) {
  if(!is.numeric(x)) x <- as.numeric(x) 
  z <- length(x)  
  y <- x[-1L] != x[-z] + incr 
  i <- c(which(y | is.na(y)), z) 
  any(diff(c(0L, i)) >= n)
}

Unit tests:

check.fun <- function(fun)
  stopifnot(
    fun(c(1,2,3),   3),
   !fun(c(1,2),     3),
   !fun(c(1),       3),
   !fun(c(1,1,1,1), 3),
   !fun(c(1,1,2,2), 3),
    fun(c(1,1,2,3), 3)
  )

check.fun(flodel.filter)
check.fun(flodel.which)
check.fun(thelatemail.rle)
check.fun(improved.rle)
check.fun(carl.seqle)

Benchmarks:

x <- sample(1:10, 1000000, replace = TRUE)

library(microbenchmark)
microbenchmark(
  flodel.filter(x, 6),
  flodel.which(x, 6),
  thelatemail.rle(x, 6),
  improved.rle(x, 6),
  carl.seqle(x, 6),
  times = 10)

# Unit: milliseconds
#                   expr       min       lq   median       uq      max neval
#    flodel.filter(x, 6)  96.03966 102.1383 144.9404 160.9698 177.7937    10
#     flodel.which(x, 6) 131.69193 137.7081 140.5211 185.3061 189.1644    10
#  thelatemail.rle(x, 6) 347.79586 353.1015 361.5744 378.3878 469.5869    10
#     improved.rle(x, 6) 199.35402 200.7455 205.2737 246.9670 252.4958    10
#       carl.seqle(x, 6) 213.72756 240.6023 245.2652 254.1725 259.2275    10
flodel
  • 87,577
  • 21
  • 185
  • 223
  • 2
    Brain clumsy this AM. How many of these funcs can handle increments other than 1? (Shameless plug for the flexibility of `seqle` :-) – Carl Witthoft Apr 20 '13 at 14:29
  • 1
    @Carl, all of them can; I have modified the functions to take an optional increment input like yours do. I have also added a version of your function. Feel free to modify it if you think it can be improved. For example, for the purpose of this question, you could use `as.integer` instead of `as.numeric`. – flodel Apr 20 '13 at 17:16
  • @flodel `flodel.filter` and `flodel.which` produce different outputs. E.g., `flodel.filter(cbind(1,2,5), 3)` evaluates (correctly) to `False` whereas `flodel.which(cbind(1,2,5), 3)` evaluates (incorrectly) to `True`. That means, `flodel.which` is not respecting the `incr` parameter at all. I have not tested the other functions. – Kalaschnik Mar 16 '20 at 08:53
11

After diff you can check for any consecutive 1s -

numbers = c(1,2,3,5,7,8)

difference = diff(numbers) == 1
## [1]  TRUE  TRUE FALSE FALSE  TRUE

## find alteast one consecutive TRUE
any(tail(difference, -1) &
    head(difference, -1))

## [1] TRUE
flodel
  • 87,577
  • 21
  • 185
  • 223
Nishanth
  • 6,932
  • 5
  • 26
  • 38
  • +1 very clever. But an explanation would be nice since the principle here is far from obvious. Took me a bit of puzzling to get it. – Konrad Rudolph Apr 20 '13 at 10:02
  • 2
    unfortunately, this can't generalize well to a larger number of consecutive numbers. I hope you won't mind the edit I think it is cleaner this way. – flodel Apr 20 '13 at 11:33
  • @flodel - thanks for the edit. Infact I wanted to use `head` and `tail` but I didn't think of `-1` indexing! – Nishanth Apr 20 '13 at 11:34
7

It's nice to see home-grown solutions here.

Fellow Stack Overflow user Carl Witthoft posted a function he named seqle() and shared it here.

The function looks like this:

seqle <- function(x,incr=1) { 
  if(!is.numeric(x)) x <- as.numeric(x) 
  n <- length(x)  
  y <- x[-1L] != x[-n] + incr 
  i <- c(which(y|is.na(y)),n) 
  list(lengths = diff(c(0L,i)),
       values = x[head(c(0L,i)+1L,-1L)]) 
} 

Let's see it in action. First, some data:

numbers1 <- c(1, 2, 3, 5, 7, 8)
numbers2 <- c(-2, 2, 3, 5, 6, 7, 8)
numbers3 <- c(1, 2, 2, 2, 1, 2, 3)

Now, the output:

seqle(numbers1)
# $lengths
# [1] 3 1 2
# 
# $values
# [1] 1 5 7
# 
seqle(numbers2)
# $lengths
# [1] 1 2 4
# 
# $values
# [1] -2  2  5
# 
seqle(numbers3)
# $lengths
# [1] 2 1 1 3
# 
# $values
# [1] 1 2 2 1
# 

Of particular interest to you is the "lengths" in the result.

Another interesting point is the incr argument. Here we can set the increment to, say, "2" and look for sequences where the difference between the numbers are two. So, for the first vector, we would expect the sequence of 3, 5, and 7 to be detected.

Let's try:

> seqle(numbers1, incr = 2)
$lengths
[1] 1 1 3 1

$values
[1] 1 2 3 8

So, we can see that we have a sequence of 1 (1), 1 (2), 3 (3, 5, 7), and 1 (8) if we set incr = 2.


How does it work with ECII's second challenge? Seems OK!

> numbers4 <- c(-2, -1, 0, 5, 7, 8)
> seqle(numbers4)
$lengths
[1] 3 1 2

$values
[1] -2  5  7
Community
  • 1
  • 1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • If `seqle` is meant to work with numerics, `y` should be replaced with something like `y <- abs(x[-1L] - x[-n] - incr) > .Machine$double.eps ^ 0.5`. Otherwise see what happens with for example `seqle(seq(0, 1, 1/17), incr = 1/17)`. – flodel Apr 20 '13 at 17:29
  • @flodel, That's wickedly demonic! :) Can I ask, how do you detect such scenarios? In other words, what made you decide to test 1/17? – A5C1D2H2I1M1N2O1R2T1 Apr 21 '13 at 04:43
  • My train of thoughts was something along -- why is `carl.seqle` slower than my `flodel.which` although they do essentially the same? I notice I use integers while he deliberately uses numerics. Why would someone want to check that numerics are equally-spaced? Also we know this is bound to floating point issues. Did Carl fall into the trap? Looks like he did: `x[-1L] != x[-n] + incr`. Let's double check with an example. I used 1/17 (an irrational) but could have picked `0.1`. See, nothing magic! – flodel Apr 21 '13 at 12:31
  • Well, FWIW I certainly did not intend for this to be used with non-integers. I was dealing w/ communications theory at the time (duh). Extending this to numeric sequences is interesting indeed. I'll have to play w/ it for a while to see if I can "break" @flodel 's nice enhancement there. – Carl Witthoft Apr 21 '13 at 12:40
  • And in my own sorry defense, I'll point out that the packaged `rle`, whose code I clearly copied, doesn't check for doubles either :-( – Carl Witthoft Apr 21 '13 at 12:47
  • @flodel, thanks for taking the time to respond. It's always interesting to see how others approach these things. I figured you were testing for floating point issues with a fraction like 1/17, but the root of my "wickedly demonic" statement was that the example you picked yielded `$lengths = c(6, 6, 6)` on my system :) – A5C1D2H2I1M1N2O1R2T1 Apr 21 '13 at 15:26
  • Oh wow. I did not even realize. What does it say about me... Bad omen! – flodel Apr 21 '13 at 15:33
  • @flodel -- as long as we're being picky :-), 1/17 is most certainly NOT irrational. But I know what you meant. – Carl Witthoft Apr 21 '13 at 19:51
5

Simple but works

numbers = c(-2,2,3,4,5,10,6,7,8)
x1<-c(diff(numbers),0)
x2<-c(0,diff(numbers[-1]),0)
x3<-c(0,diff(numbers[c(-1,-2)]),0,0)

rbind(x1,x2,x3)
colSums(rbind(x1,x2,x3) )==3 #Returns TRUE or FALSE where in the vector the consecutive intervals triplet takes place
[1] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE

sum(colSums(rbind(x1,x2,x3) )==3) #How many triplets of consecutive intervals occur in the vector
[1] 3

which(colSums(rbind(x1,x2,x3) )==3) #Returns the location of the triplets consecutive integers
[1] 2 3 7

Note that this will not work for consecutive negative intervals c(-2,-1,0) because of how diff() works

ECII
  • 10,297
  • 18
  • 80
  • 121