Perhaps this is already answered and I missed it, but it's hard to search.
A very simple question: Why is dt[,x]
generally a tiny bit faster than dt$x
?
Example:
dt<-data.table(id=1:1e7,var=rnorm(1e6))
test<-microbenchmark(times=100L,
dt[sample(1e7,size=200000),var],
dt[sample(1e7,size=200000),]$var)
test[,"expr"]<-c("in j","$")
Unit: milliseconds
expr min lq mean median uq max neval
$ 14.28863 15.88779 18.84229 17.23109 18.41577 53.63473 100
in j 14.35916 15.97063 18.87265 17.99266 18.37939 54.19944 100
I might not have chosen the best example, so feel free to suggest something perhaps more poignant.
Anyway, evaluating in j
is faster at least 75% of the time (though there appears to be a fat upper tail as the mean is higher; side note, it would be nice if microbenchmark
could spit me out some histograms).
Why is this the case?