there are some problem when I'm computing `A.transpose*A' in cuda.
Suppose A is M*N matrix and stored in column-major, and I try to use this function cublasSgemm_v2
which is the Matrix-Matrix Multiplication API in cublas
like this :
cublasSgemm_v2(handle,CUBLAS_OP_T,CUBLAS_OP_N,N,N,M,&al,A,N,A,M,&beta,A_result,N)
Before call this function I test matrix A and it looks good , but it shows that parameter number 8 is illegal, I don't know why.
So I decide to use another API to compute A.tanspose*A cublas<t>syrk()
. And the result returned stored in lower or upper of the matrix that means the rest of matrix is not referenced, and how to write a kernel to copy the elements to the symmetry part?
The other problem is my program sometimes crashed (may be one third possibility ) in beginning of the code like cudaMalloc or cbulascreate or somewhere else, I just modify some code in the middle of code , and it run many times before, what may be cause this?
Thank you