In a custom function, I want to run an if
condition if my vector only has one unique value. I can use length(unique(x)) == 1
. However, I think that this could be more efficient: instead of getting all the unique values in the vector and then count them, I could just stop after having found one value that is different from the first one:
# Should be TRUE
test <- rep(1, 1e7)
bench::mark(
length(unique(test)) == 1,
all(test == test[1])
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 length(unique(test)) == 1 154.1ms 158.6ms 6.31 166.1MB 6.31
#> 2 all(test == test[1]) 38.1ms 49.2ms 19.6 38.1MB 3.92
# Should be FALSE
test2 <- rep(c(1, 2), 1e7)
bench::mark(
length(unique(test2)) == 1,
all(test2 == test2[1])
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 length(unique(test2)) == 1 341.2ms 386.1ms 2.59 332.3MB 2.59
#> 2 all(test2 == test2[1]) 59.5ms 81.1ms 11.5 76.3MB 1.92
It is indeed more efficient.
Now, suppose that I want to replace length(unique(x)) == 2
. I could probably do something similar to stop as soon as I find 3 different values but I don't see how can I generalize this to replace length(unique(x)) == n
where n
can be any positive integer.
Is there an efficient and general way to do this?
(I'm looking for a solution in base R, and if you can improve the benchmark for n = 1
, feel free to suggest).