I am implementing a simple version for matrix per matrix multiplication and matrix per vector multiplication with openblas with dgemm and dgemv. I see that openblas is only running on one core.
I tried adding the -lpthread for compilation but that did not make it work.
The way I am calling dgemm and dgemv is simple:
cblas_dgemv(order, trans, m, n, alpha, a, lda, x, incx, beta, y, incy);
cblas_dgemm(M, N, K, alpha, A, 1, M, B, 1, K, beta, C, 1, M);
Has anyone successfully run openblas on multiple cores?