125

I'm trying to test whether all elements of a vector are equal to one another. The solutions I have come up with seem somewhat roundabout, both involving checking length().

x <- c(1, 2, 3, 4, 5, 6, 1)  # FALSE
y <- rep(2, times = 7)       # TRUE

With unique():

length(unique(x)) == 1
length(unique(y)) == 1

With rle():

length(rle(x)$values) == 1
length(rle(y)$values) == 1

A solution that would let me include a tolerance value for assessing 'equality' among elements would be ideal to avoid FAQ 7.31 issues.

Is there a built-in function for type of test that I have completely overlooked? identical() and all.equal() compare two R objects, so they won't work here.

Edit 1

Here are some benchmarking results. Using the code:

library(rbenchmark)

John <- function() all( abs(x - mean(x)) < .Machine$double.eps ^ 0.5 )
DWin <- function() {diff(range(x)) < .Machine$double.eps ^ 0.5}
zero_range <- function() {
  if (length(x) == 1) return(TRUE)
  x <- range(x) / mean(x)
  isTRUE(all.equal(x[1], x[2], tolerance = .Machine$double.eps ^ 0.5))
}

x <- runif(500000);

benchmark(John(), DWin(), zero_range(),
  columns=c("test", "replications", "elapsed", "relative"),
  order="relative", replications = 10000)

With the results:

          test replications elapsed relative
2       DWin()        10000 109.415 1.000000
3 zero_range()        10000 126.912 1.159914
1       John()        10000 208.463 1.905251

So it looks like diff(range(x)) < .Machine$double.eps ^ 0.5 is fastest.

s_baldur
  • 29,441
  • 4
  • 36
  • 69
kmm
  • 6,045
  • 7
  • 43
  • 53
  • For equality without tolerance, `max(x) == min(x)` is an order of magnitude faster than `diff(range(x))`, and works with characters as well as numbers – Waldi Jan 14 '22 at 10:55

11 Answers11

72

Why not simply using the variance:

var(x) == 0

If all the elements of x are equal, you will get a variance of 0. This works only for double and integers though.

Edit based on the comments below:
A more generic option would be to check for the length of unique elements in the vector which must be 1 in this case. This has the advantage that it works with all classes beyond just double and integer from which variance can be calculated from.

length(unique(x)) == 1
ATpoint
  • 603
  • 5
  • 17
Yohan Obadia
  • 2,552
  • 2
  • 24
  • 31
  • 34
    `length(unique(x))=1` ends up being about twice as fast, but `var` is terse which is nice. – AdamO Jul 18 '17 at 18:04
  • 1
    YohanBadia, I have an array c(-5.532456e-09, 1.695298e-09), and get `John test: TRUE ; DWin test: TRUE ; zero-range test: TRUE ; variance test: FALSE` meaning all other test recognise that the values are identical in R. How can variance test be used in that context? – mjs Jan 24 '20 at 09:20
  • 1
    The 2 values in your array are not identical. Why would you want the test to return `TRUE` ? In the case of John's answer, you check whether the difference is above a certain threshold. In your case the difference between the 2 values is very low, which could lead to it beeing below the threshold you defined. – Yohan Obadia Jan 24 '20 at 13:56
  • 1
    "Why not simply using the variance"? Because `var(x)` is `NA` for `x <- c("a", "b")` – bers Oct 26 '21 at 08:27
  • Var will work in instances where the length of the vector in question change and the numeric that it equals is static, removes the need to reference more dynamic objects -- numeric only – JJ Fantini Mar 17 '22 at 15:36
42

If they're all numeric values then if tol is your tolerance then...

all( abs(y - mean(y)) < tol ) 

is the solution to your problem.

EDIT:

After looking at this, and other answers, and benchmarking a few things the following comes out over twice as fast as the DWin answer.

abs(max(x) - min(x)) < tol

This is a bit surprisingly faster than diff(range(x)) since diff shouldn't be much different than - and abs with two numbers. Requesting the range should optimize getting the minimum and maximum. Both diff and range are primitive functions. But the timing doesn't lie.

And, in addition, as @Waldi pointed out, abs is superfluous here.

John
  • 23,360
  • 7
  • 57
  • 83
  • Can you comment on the relative merits of subtracting off the mean compared to dividing by it? – hadley Jan 22 '11 at 03:43
  • It is computationally simpler. Depending on the system, and how R is compiled and vectorized, it will be accomplished faster with less power consumption. Also, when you divide by the mean your tested outcome is relative to 1 while with subtraction it's 0, which seems nicer to me. Also, the tolerance has a more straightforward interpretation. – John Jan 22 '11 at 08:19
  • 1
    But it's not even so much that division is complex as the search and sort required to extract the range is much more computationally expensive than a simple subtraction. I tested it and the above code is about 10x faster than the zero_range function Hadley (and yours is about the fastest correct answer here). The compare function of Dirk's is brutally slow. This is the fastest answer here. – John Jan 22 '11 at 08:25
  • Just saw Josh's timing comments in your answer Hadley... I don't get any situations where zero_range is faster. The discrepancy is between slightly faster (maybe 20%) to 10x always in favour if this answer. It tried a number of methods. – John Jan 22 '11 at 08:38
  • Or simply : `max(x) - min(x)=min` – Waldi Jan 14 '22 at 10:45
  • 1
    @Waldi good point. Although, the parentheses around the subtraction would need to be retained. – John Jan 16 '22 at 02:24
41

I use this method, which compares the min and the max, after dividing by the mean:

# Determine if range of vector is FP 0.
zero_range <- function(x, tol = .Machine$double.eps ^ 0.5) {
  if (length(x) == 1) return(TRUE)
  x <- range(x) / mean(x)
  isTRUE(all.equal(x[1], x[2], tolerance = tol))
}

If you were using this more seriously, you'd probably want to remove missing values before computing the range and mean.

hadley
  • 102,019
  • 32
  • 183
  • 245
  • I chose this one for being faster than Dirk's. I don't have millions of elements, but this should run a little quicker for me. – kmm Jan 21 '11 at 00:02
  • @Kevin: what about John's solution? It's ~10x faster than Hadley's and allows you to set tolerance. Is it deficient in some other way? – Joshua Ulrich Jan 21 '11 at 15:36
  • Please provide some benchmarking - I just checked mine is about the same for a vector of a million uniforms. – hadley Jan 21 '11 at 17:24
  • @hadley: I was running `system.time(for(i in 1:1e4) zero_range(x))`, where `x` was from the OP. John's solution is ~10x for `x`, ~3x faster for `y` and slightly slower for `runif(1e6)`. – Joshua Ulrich Jan 21 '11 at 18:34
  • 10x difference doesn't matter much when you're looking at the difference between 0.00023 and 0.000023 seconds - and DWin would probably claim they are the same to the specified degree of tolerance ;) – hadley Jan 22 '11 at 01:32
  • Nice. What does ``isTRUE`` do that ``all.equal`` doesn't already do? Thanks. – PatrickT Nov 03 '17 at 09:13
31

You can just check all(v==v[1])

Maya Levy
  • 373
  • 3
  • 7
  • This one is great bc it works with strings too! Thanks – arvi1000 Jan 15 '20 at 22:18
  • 4
    This works unless you have `NA` in your vector: `x <- c(1,1,NA); all(x == x[1])` returns `NA`, not `FALSE`. In such cases `length(unique(x)) == 1` works. – HBat Aug 10 '20 at 10:22
25
> isTRUE(all.equal( max(y) ,min(y)) )
[1] TRUE
> isTRUE(all.equal( max(x) ,min(x)) )
[1] FALSE

Another along the same lines:

> diff(range(x)) < .Machine$double.eps ^ 0.5
[1] FALSE
> diff(range(y)) < .Machine$double.eps ^ 0.5
[1] TRUE
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I don't think this works so well for very small numbers: `x <- seq(1, 10) / 1e10` – hadley Jan 20 '11 at 21:08
  • 2
    @Hadley: The OP asked for a solution that would allow specification of a tolerance, presumably because he didn't care about very small differences. all.equal can be used with other tolerances and the OP appears to understand this. – IRTFM Jan 20 '11 at 21:13
  • 2
    I didn't express myself very clearly - in my example there is a ten-fold relative difference between the largest and smallest numbers. That's probably something you want to notice! I think numerical tolerance needs to be calculated relative to the range of the data - I have not done this in the past and it has caused problems. – hadley Jan 20 '11 at 21:19
  • 3
    I don't think I misunderstood you in the slighest. I just thought the questioner was asking for a solution that would ignore a tenfold relative difference for numbers that are effectively zero. I heard him as asking for a solution that would ignore the difference between 1e-11 and 1e-13. – IRTFM Jan 20 '11 at 21:27
  • 5
    I try and give people what they need, not what they want ;) But point taken. – hadley Jan 21 '11 at 01:21
16

You can use identical() and all.equal() by comparing the first element to all others, effectively sweeping the comparison across:

R> compare <- function(v) all(sapply( as.list(v[-1]), 
+                         FUN=function(z) {identical(z, v[1])}))
R> compare(x)
[1] FALSE
R> compare(y)
[1] TRUE
R> 

That way you can add any epsilon to identical() as needed.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
11

Since I keep coming back to this question over and over, here's an Rcpp solution that will generally be much much faster than any of the R solutions if the answer is actually FALSE (because it will stop the moment it encounters a mismatch) and will have the same speed as the fastest R solution if the answer is TRUE. For example for the OP benchmark, system.time clocks in at exactly 0 using this function.

library(inline)
library(Rcpp)

fast_equal = cxxfunction(signature(x = 'numeric', y = 'numeric'), '
  NumericVector var(x);
  double precision = as<double>(y);

  for (int i = 0, size = var.size(); i < size; ++i) {
    if (var[i] - var[0] > precision || var[0] - var[i] > precision)
      return Rcpp::wrap(false);
  }

  return Rcpp::wrap(true);
', plugin = 'Rcpp')

fast_equal(c(1,2,3), 0.1)
#[1] FALSE
fast_equal(c(1,2,3), 2)
#[2] TRUE
eddi
  • 49,088
  • 6
  • 104
  • 155
  • 1
    This is nice & +1 for speed, but I'm not convinced that comparing all elements to the 1st element is quite right. A vector can pass this test, yet the difference between max(x) and min(x) be greater than precision. For example `fast_equal(c(2,1,3), 1.5)` – dww Apr 06 '17 at 04:29
  • @dww What you're pointing out is that comparison is not transitive when you have precision issues - i.e. `a == b`, `b == c` does not necessarily imply `a == c` if you're doing floating point comparisons. You can either divide your precision by the number of elements to avoid this issue, or modify the algorithm to compute `min` and `max` and using that as a stopping condition. – eddi Apr 06 '17 at 15:36
10

I wrote a function specifically for this, which can check not only elements in a vector, but also capable of checking if all elements in a list are identical. Of course it as well handle character vectors and all other types of vector well. It also has appropriate error handling.

all_identical <- function(x) {
  if (length(x) == 1L) {
    warning("'x' has a length of only 1")
    return(TRUE)
  } else if (length(x) == 0L) {
    warning("'x' has a length of 0")
    return(logical(0))
  } else {
    TF <- vapply(1:(length(x)-1),
                 function(n) identical(x[[n]], x[[n+1]]),
                 logical(1))
    if (all(TF)) TRUE else FALSE
  }
}

Now try some examples.

x <- c(1, 1, 1, NA, 1, 1, 1)
all_identical(x)       ## Return FALSE
all_identical(x[-4])   ## Return TRUE
y <- list(fac1 = factor(c("A", "B")),
          fac2 = factor(c("A", "B"), levels = c("B", "A"))
          )
all_identical(y)     ## Return FALSE as fac1 and fac2 have different level order
Lawrence Lee
  • 668
  • 1
  • 7
  • 8
5

You do not actually need to use min, mean, or max. Based on John's answer:

all(abs(x - x[[1]]) < tolerance)
3

Here an alternative using the min, max trick but for a data frame. In the example I am comparing columns but the margin parameter from apply can be changed to 1 for rows.

valid = sum(!apply(your_dataframe, 2, function(x) diff(c(min(x), max(x)))) == 0)

If valid == 0 then all the elements are the same

pedrosaurio
  • 4,708
  • 11
  • 39
  • 53
2

Another solution which uses the data.table package, compatible with strings and NA is uniqueN(x) == 1

Daniel V
  • 1,305
  • 7
  • 23