I have a question about the BLAS and cublas interface and how to do a particular matrix-vector multiplication.
I have a call that I currently do with cblas_zgemv
and gives the correct result.
Basically I have a complex 3x3 matrix A
and a complex vector v
(3 components), in c-ordering all contiguous, and I want to multiply the hermitian-conjugate (conjugate and transpose) of the matrix to the vector.
This call can be made with cblas by calling
std::complex<double> const alpha=1.0;
std::complex<double> const beta=0.0;
cblas_zgemv(CblasRowMajor, CblasConjTrans, 3 /*A size*/, 3/*A size*/, &alpha, A_pointer, 3 /*A stride*/, v_pointer, 1, &beta, result_pointer, 1 /*result_stride*/)
It looks as if this cannot be implemented by the BLAS interface. http://www.netlib.org/lapack/explore-html/dc/dc1/group__complex16__blas__level2_gafaeb2abd9fffa7442b938dc384aeaf47.html#gafaeb2abd9fffa7442b938dc384aeaf47
The reason seems to be that BLAS doesn't have the "conjugate-only" option in the "trans" argument.
The equivalent call in BLAS (not cblas) would be:
gemv( trans , 3, 3, alpha, A_pointer, 3, v_pointer, 1, beta, result_pointer, 1);
The trans
options are only N
(no-transpose, i.e. tranpose in C-ordering), T
(tranpose, i.e no-transpose in C), C
(conjugate-transpose, only-conjugate in C).
By symmetry, there another combination missing, something like CO
("conjugate-only" --without transpose--, which would correspond to conjugate-tranpose in C-ordering).
So, it seems that there is an inconvenient hole in BLAS for complex elements. What is worst, this limitation seems to propagate to CUDA BLAS (cuBLAS) which has the same interface as BLAS basically. Making a cuBLAS call would be my final goal. https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-gemv
Like in BLAS, cuBLAS has only three options (via enum) for the transposition option: CUBLAS_OP_N
, CUBLAS_OP_T
, CUBLAS_OP_H
.
Am I missing something obvious? Is this a known limitation and there is a well known workaround?
NOTE: I know that perhaps this particular case can be implemented with GEMM in a case with dimension 1xn
, but then the same argument can be made about GEMM, it lacks the the "conjugate-only" option.