1

I have a question about the BLAS and cublas interface and how to do a particular matrix-vector multiplication.

I have a call that I currently do with cblas_zgemv and gives the correct result.

Basically I have a complex 3x3 matrix A and a complex vector v (3 components), in c-ordering all contiguous, and I want to multiply the hermitian-conjugate (conjugate and transpose) of the matrix to the vector. This call can be made with cblas by calling

std::complex<double> const alpha=1.0;
std::complex<double> const beta=0.0;
cblas_zgemv(CblasRowMajor, CblasConjTrans, 3 /*A size*/, 3/*A size*/, &alpha, A_pointer, 3 /*A stride*/, v_pointer, 1, &beta, result_pointer, 1 /*result_stride*/)

It looks as if this cannot be implemented by the BLAS interface. http://www.netlib.org/lapack/explore-html/dc/dc1/group__complex16__blas__level2_gafaeb2abd9fffa7442b938dc384aeaf47.html#gafaeb2abd9fffa7442b938dc384aeaf47

The reason seems to be that BLAS doesn't have the "conjugate-only" option in the "trans" argument.

The equivalent call in BLAS (not cblas) would be:

gemv( trans , 3, 3, alpha, A_pointer, 3, v_pointer, 1, beta, result_pointer, 1);

The trans options are only N (no-transpose, i.e. tranpose in C-ordering), T (tranpose, i.e no-transpose in C), C (conjugate-transpose, only-conjugate in C).

By symmetry, there another combination missing, something like CO ("conjugate-only" --without transpose--, which would correspond to conjugate-tranpose in C-ordering).

So, it seems that there is an inconvenient hole in BLAS for complex elements. What is worst, this limitation seems to propagate to CUDA BLAS (cuBLAS) which has the same interface as BLAS basically. Making a cuBLAS call would be my final goal. https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-gemv

Like in BLAS, cuBLAS has only three options (via enum) for the transposition option: CUBLAS_OP_N, CUBLAS_OP_T, CUBLAS_OP_H.

Am I missing something obvious? Is this a known limitation and there is a well known workaround?


NOTE: I know that perhaps this particular case can be implemented with GEMM in a case with dimension 1xn, but then the same argument can be made about GEMM, it lacks the the "conjugate-only" option.

alfC
  • 14,261
  • 4
  • 67
  • 118
  • 1
    I checked the lastest `cublas_api.h` and noticed there's a `CUBLAS_OP_CONJG` option with comment `/* conjugate, placeholder - not supported in the current release */`. – Kh40tiK Sep 16 '21 at 01:30
  • @Kh40tiK Good catch. This confirms that there is a hole in BLAS (and cublas) and where to look for when it is implemented. It is notable how this problem is little mentioned around internet. – alfC Sep 16 '21 at 06:49

1 Answers1

3

There is no direct way to do conjugate only with standard BLAS API.

But there is a easy way to do Conjugate only. When you have a row major matrix, this can be done by setting CblasConjTrans and using CblasColMajor instead of CblasRowMajor and vice versa for col major matrix. To summarise

// Conj Transpose
cblas_zgemv(CblasRowMajor, CblasConjTrans, 3 /*A size*/, 3/*A size*/, &alpha, A_pointer, 3 /*A stride*/, v_pointer, 1, &beta, result_pointer, 1 /*result_stride*/)

// No transpose  and No Conj
cblas_zgemv(CblasRowMajor, CblasNoTrans, 3 /*A size*/, 3/*A size*/, &alpha, A_pointer, 3 /*A stride*/, v_pointer, 1, &beta, result_pointer, 1 /*result_stride*/)

// Only Trans
cblas_zgemv(CblasRowMajor, CblasTrans, 3 /*A size*/, 3/*A size*/, &alpha, A_pointer, 3 /*A stride*/, v_pointer, 1, &beta, result_pointer, 1 /*result_stride*/)

//Only conj()
cblas_zgemv(CblasColMajor, CblasConjTrans, 3 /*A size*/, 3/*A size*/, &alpha, A_pointer, 3 /*A stride*/, v_pointer, 1, &beta, result_pointer, 1 /*result_stride*/)

If you want a direct way, try using BLIS library.

  • This is fine but I think this is cblas only. not blas. blas is usually the one generalized to other contexts (cuda) which is what I intent to do eventually. – alfC Aug 13 '22 at 01:04