Common sense indicates that any computation should be faster the more cores or threads we use. If the scaling is bad, the computation time will not improve with increasing number of threads. Thus, how come increasing threads considerably reduces the computation time when fitting a gam with R package MGCV, as shown by this example? :
library(boot) # loads data "amis"
t1<-Sys.time()
mod <- gam(speed ~ s(period, warning, pair, k = 12), data = amis, family=tw (link = log),method="REML",control=list(nthreads=1)) #
t2<-Sys.time()
print("Model fitted in:")
print(t2-t1)
If you increase the number of threads in this example to 2, 4, etc, the fitting procedure will take longer and longer, instead of being faster as we would expect. In my particular case:
1 thread: 32.85333 secs
2 threads: 50.63166 secs
3 threads: 1.2635 mins
Why is this? If I am doing something wrong, what can I do to obtain the desired behavior (i.e., increasing performance with increasing number of threads)?
Some notes:
1) The model, family and solving method shown here make no particular sense. This is only an example. However, I’ve got into this problem with real data and a reasonable model (but for simplicity I use this small code to exemplify the problem). Data, functional form of model, family, solving method seem all to be irrelevant: after many tests I get always the same behaviour, i.e., increasing the number of used threads, decreases performance (i.e., increases computation time).
2) Operative System: Linux Ubuntu 18.04;
3) Architecture: DELL Power Edge with two physical CPUs Intel Xeon X5660 each of them with 6 cores @2800 Mhz and each core being able of handling 2 threads (i.e., total of 24 threads). 80Gb RAM.
4) OpenMP libraries (which are needed for the multi-threath capacity of function gam) were installed with
sudo apt-get install libomp-dev
5) I am aware of the help page for multi-core use of gam (https://stat.ethz.ch/R-manual/R-devel/library/mgcv/html/mgcv-parallel.html). The only thing written there pointing to a decrease of performance with increasing number of threads is "Because the computational burden in mgcv is all in the linear algebra, then parallel computation may provide reduced (...) benefit with a tuned BLAS".