I'm trying to use Gemm for matrix multiplication on Alea GPU, however, this code gives the wrong result.
Gpu gpu = Gpu.Default;
Blas blas = new Blas(gpu);
int m=2,n=3; //in dimension and out dimension (output will be mxn matrix)
int k=4;
//column major
float[,] A = new float[4,2] { {100,200},{2,6},{3,7},{4,8} }; //2x4 matrix
float[,] B = new float[3,4] { {1,4,7,10}, {2,5,8,11}, {3,6,9,12} }; //4x3 matrix
float[,] C = new float[3,2] { {-1,-1}, {-1,-1}, {-1,-1} }; //2x3 matrix
var dA = gpu.AllocateDevice<float>(A);
var dB = gpu.AllocateDevice<float>(B);
var dC = gpu.AllocateDevice<float>(C);
blas.Gemm(Operation.N,Operation.N,m,n,k,1f,dA.Ptr,m,dB.Ptr,k,0f,dC.Ptr,m);
var result = Gpu.Copy2DToHost(dC);
This is the result I get. It just copies some number from matrix A. Some numbers in matrix C do not change from the initialization.
100 -1 -1
200 -1 -1
Is there anything wrong with the code? Please help.
I'm using alea 3.0.3 with cuda toolkit 8.0.
UPDATE1: I've found that it gives correct result when I flatten A,B,C matrices to 1D-arrays. However, still want to know what's wrong with 2D-arrays.