In R, an advantage of using integers over doubles is object size. I'm somewhat surprized not to find such an advatage in performance. My naive expectation was that operating with less information would be more efficient.
My work does a lot of number crunching and I wanted to decide on weather consistently using integers or double type in my data.tables and functions.
I'm aware of integer overflow, which is not an issue here on my specific variables.
I am talking about variables which nature is integer. They never become fractions/decimals. But they still need to be transposed (using R's operators), but again, back into integers.
set.seed(1)
d <- sample(c(31318, 110221, 103351, 72108, 231533, 155212, 173406), 1e4, replace = TRUE)
i <- as.integer(d)
f1 <- function(x){
hour <- trunc(x / 1e4)
min <- trunc((x - hour * 1e4) / 1e2)
sec <- x - hour * 1e4 - min * 1e2
as.integer(hour * 3600 + min * 60 + sec)
}
f2 <- function(x){
hh <- x %/% 1e4
mm <- x %% 1e4 %/% 1e2
ss <- x %% 1e2
as.integer(hh * 3600 + mm * 60 + ss)
}
f1i <- function(x){
hour <- as.integer(x / 1e4L)
min <- as.integer((x - hour * 1e4L) / 1e2L)
sec <- as.integer(x - hour * 1e4L - min * 1e2)
hour * 3600L + min * 60L + sec
}
f2i <- function(x){
hh <- x %/% 1e4L
mm <- x %% 1e4L %/% 1e2L
ss <- x %% 1e2L
hh * 3600L + mm * 60L + ss
}
microbenchmark::microbenchmark(
f1(i), f2(i), f1i(i), f2i(i),
f1(d), f2(d), f1i(d), f2i(d),
times = 1e2
)
Unit: microseconds
expr min lq mean median uq max neval
f1(i) 277.413 279.4670 316.0315 282.1055 341.3420 928.132 100
f2(i) 705.557 707.0230 829.8002 710.6880 796.6105 5366.158 100
f1i(i) 355.124 356.5910 451.0255 358.4965 449.4035 3242.158 100
f2i(i) 346.620 347.7930 391.1675 349.6990 366.5605 989.714 100
f1(d) 237.824 240.3175 350.9075 242.5170 295.3025 6946.476 100
f2(d) 702.037 703.9435 869.6909 708.1960 874.7610 5113.378 100
f1i(d) 341.048 342.9545 514.6488 345.0075 428.8765 4231.285 100
f2i(d) 705.556 707.3160 777.2969 710.3955 882.5325 1855.678 100
object.size(d) # 80048 bytes
object.size(i) # 40048 bytes
- Why is there no performance advantage of consistently operating with integers?
- What is the use of modulus or integer devision if
trunc((x - hour * 1e4) / 1e2)
is more efficient thanx %% 1e4L %/% 1e2L
- And most important what would be best practice from the point a an experienced R / data.table user?