2

Main Question (#1):
Is there a secret trick to get anaconda R in particular (not other R please, there is no problem with other R) to use the resident BLAS subsystem on Ubuntu?

Other numerical apps (Octave) on same host are demonstrating BLAS subsystem installation is valid and high performance as expected for numerical matrix operations depending on value of environment variable OMP_NUM_THREADS. No special trick was needed for hooking this non-anaconda software into the resident BLAS subsystem.

Related question (#2):
Is Intel MKL BLAS subsystem distributed with anaconda and able to be utilized by anaconda R, as fallback from OpenBLAS? In this case, how to activate MKL BLAS?

Program htop reveals single core 100% utilization only by Anaconda R during benchmark run. Unexpectedly, Anaconda R cannot use OpenBLAS for parallel matrix code. Dramatic slowdown of > 40x observed versus reference machine using ordinary R not anaconda R. This is an unacceptable performance problem.

The blas subsystem installation itself was validated as correct and fully functional using GNU Octave package by running a numerical benchmark showing excellent improvement as anticipated when OMP_NUM_THREADS=1 was changed to OMP_NUM_THREADS=16, which is appropriate value for the test hardware.

Final Question (#3):
Is the anaconda maintainer vendor intentionally building the R binary for use with Intel MKL only, excluding OpenBLAS subsystem?

Thanks for reading.

Request any responses be limited to people with experience actually using MKS or OpenBLAS successfully with anaconda software packages only. Thanks for understanding.

Feel free to use my benchmark:

# Plain old R, OpenBLAS, OMP_NUM_THREADS=4, Intel Core i7-920, 4 HW cores:
n=5e3
print(system.time({ x <- replicate(n, rnorm(n)); tcrossprod(x) }))
# elapsed: 6.1 

# Anaconda R, OpenBLAS, OMP_NUM_THREADS=16, 2 x Intel Xeon-2690, 16 HW cores:
n=5e3
print(system.time({ x <- replicate(n, rnorm(n)); tcrossprod(x) }))
# elapsed: 291.8

GNU Octave benchmarks on same machine 2 x Intel Xeon-2690:
**Note GNU Octave is outside of anaconda**

ga@ga-HP-Z820:~/projects/blas_bench$ bash octave_benchmark.sh
Run program in octave using NVBLAS GPU-parallel algebra code, not CPU. LD_PRELOAD=libnvblas.so:
[NVBLAS] NVBLAS_CONFIG_FILE environment variable is set to 'nvblas.conf'
[NVBLAS] NVBLAS_CONFIG_FILE environment variable is set to 'nvblas.conf'
Elapsed time:
 0.012405
Gigaflops:
   7.0905e+05
Run program in octave using OpenBLAS CPU-parallel algebra code. OMP_NUM_THREADS=16 LD_PRELOAD=libblas.so.3:
Elapsed time:
 17.559
Gigaflops:
 500.95
Same but without LD_PRELOAD=libblas.so.3, does it still really run fast, indicating OpenBLAS is still used? OMP_NUM_THREADS=16:
Elapsed time:
 16.063
Gigaflops:
 547.61
Run program in almost plain old octave: This might still use OpenBLAS since its default ON on this host. Setting OMP_NUM_THREADS=1 so its nearly stock.
Elapsed time:
 233.34
Gigaflops:
 37.696

(Please ignore NVBLAS which did not complete test due to NVIDIA proprietary video driver installation becoming damaged by Ubuntu Automatic Update at some point, sadly.)

zx485
  • 28,498
  • 28
  • 50
  • 59
Geoffrey Anderson
  • 1,534
  • 17
  • 25
  • @zx485 DId you ask me something? – Geoffrey Anderson Mar 06 '17 at 23:10
  • This is definitely related. THey might be the Contiinuum devs. https://github.com/conda/conda/issues/2097 Maybe I need to change to nomkl version of all the anaconda packages I have installed? How? Is there a simple single on-off switch for doing that? – Geoffrey Anderson Mar 06 '17 at 23:10

0 Answers0