Using integer division %/%
might be an efficient way.
df$group <- (df$birthyear - 1989L) %/% 3L
df
# x1 birthyear group
#1 1 1992 1
#2 5 1994 1
#3 7 1993 1
#4 8 1992 1
#5 2 1995 2
#6 2 1999 3
#7 3 2000 3
#8 4 2001 4
#9 5 2000 3
#10 10 1994 1
To start from the lowest birthyear:
(df$birthyear - min(df$birthyear) + 3L) %/% 3L
# [1] 1 1 1 1 2 3 3 4 3 1
In case the rang should be tested pmin
and pmax
can be used.
(pmax(1989L, pmin(2023L, df$birthyear)) - 1989L) %/% 3L
# [1] 1 1 1 1 2 3 3 4 3 1
Also findInterval
could be used.
findInterval(df$birthyear, seq(1992, 2022, 3))
# [1] 1 1 1 1 2 3 3 4 3 1
Benchmark:
set.seed(42)
x <- sample(1992:2021, 10001, TRUE)
bench::mark(
"cut" = cut(x, seq(1992, 2022, 3), labels = F, right = F),
"findInterval" = findInterval(x, seq(1992, 2022, 3)),
"%/%pminMax" = (pmax(1989L, pmin(2023L, x)) - 1989L) %/% 3L,
"%/%min" = (x - min(x) + 3L) %/% 3L,
"%/%" = (x - 1989L) %/% 3L
)
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
#1 cut 219µs 223.9µs 3875. 117.3KB 8.17 1898 4
#2 findInterval 143.2µs 148.9µs 6450. 117.3KB 13.6 2855 6
#3 %/%pminMax 75.2µs 77.7µs 12263. 117.4KB 27.3 5835 13
#4 %/%min 53.7µs 54.1µs 18153. 39.1KB 12.3 8852 6
#5 %/% 35.5µs 35.9µs 27166. 39.1KB 19.0 9993 7