We have a Python code which involves expensive linear algebra computations. The data is stored in NumPy arrays. The code uses numpy.dot, and a few BLAS and LAPACK functions which are currently accessed through scipy.linalg.blas and scipy.linalg.lapack. The current code is written for CPU. We want to convert the code so that some of the NumPy, BLAS, and LAPACK operations are performed on a GPU.
I am trying to determine the best way to do this is. As far as I can tell, Numba does not support BLAS and LAPACK functions on the GPU. It appears that PyCUDA may the best route, but I am having trouble determining whether PyCUDA allows one to use both BLAS and LAPACK functions.
EDIT: We need the code to be portable to different GPU architectures, including AMD and Nvidia. While PyCUDA appears to offer the desired functionality, CUDA (and hence, PyCUDA) cannot run on AMD GPUs.