Multiply two-dimensional matrices.‏ with pycuda

Question

how can I iterate in two arrays?

__global__ void euclidean(float *x, float *y, int dim_x, int dim_y, int ms, float *solution) {

            int idx = threadIdx.x + blockDim.x * blockIdx.x;
            int idy = threadIdx.y + blockDim.y * blockIdx.y;

            float result = 0.0;

            for (int iter = 0; iter < ms; iter++) {

                float x_e = x[idy * ms + iter];
                float y_e = y[idx * ms + iter];

                result += (x_e * y_e);
            }
}

Input: X = [[1,2], [3,4], [5,6], [7,8], [9,10]] and Y = [[0,0], [1,1]]

Expected Output: [[0, 3], [0, 7], [0, 11], [0, 15]. [0, 19]]

How can I do this? My difficulty is to iterate on X and Y.

Expected:

[idx: 0 idy: 0 = 0] [idx: 1 idy: 0 = 3] [idx: 2 idy: 0 = 0] [idx: 3 idy: 0 = 7] [idx: 4 idy: 0 = 0] [idx: 0 idy: 1 = 11] [idx: 1 idy: 1 = 0] [idx: 2 idy: 1 = 15] [idx: 3 idy: 1 = 0] [idx: 4 idy: 1 = 19]

Charles Menguy · Accepted Answer · 2013-01-12T19:37:51.503

I would do the following to multiply 2 matrices. This handles boundary conditions so should work on any grid/block size.

// Compute C = A * B
__global__ void matrixMultiply(float * A, float * B, float * C,
                   int numARows, int numAColumns,
                   int numBRows, int numBColumns,
                   int numCRows, int numCColumns) {
    float cValue = 0;
    int Row = blockIdx.y * blockDim.y + threadIdx.y;
    int Col = blockIdx.x * blockDim.x + threadIdx.x;

    if ((Row < numCRows) && (Col < numCColumns)) {
        for (int k = 0; k < numAColumns; k++) {
            cValue += A[Row*numAColumns + k] * B[k*numBColumns + Col];
        }
        C[Row*numCColumns + Col] = cValue;
    }
}

If you want a more efficient implementation you can also use the shared memory:

// Compute C = A * B
__global__ void matrixMultiplyShared(float * A, float * B, float * C,
                     int numARows, int numAColumns,
                     int numBRows, int numBColumns,
                     int numCRows, int numCColumns) {
    __shared__ float ds_A[TILE_WIDTH_I][TILE_WIDTH_I];
    __shared__ float ds_B[TILE_WIDTH_I][TILE_WIDTH_I];

    int bx = blockIdx.x;
    int by = blockIdx.y;
    int tx = threadIdx.x;
    int ty = threadIdx.y;

    int Row = by * TILE_WIDTH + ty;
    int Col = bx * TILE_WIDTH + tx;
    float cValue = 0;

    for (int m = 0; m < (numAColumns/TILE_WIDTH); m++) {
        if (Row < numARows && m*TILE_WIDTH_I + tx < numAColumns) {
          ds_A[ty][tx] = A[Row*numAColumns + m*TILE_WIDTH_I + tx];
        } else {
         ds_A[ty][tx] = 0;
       }

        if (m*TILE_WIDTH_I + ty < numBRows && Col < numBColumns) {
          ds_B[ty][tx] = B[(m*TILE_WIDTH_I + ty)*numBColumns + Col];
       } else {
         ds_B[ty][tx] = 0;
       }

        __syncthreads();

        if ((Row < numCRows) && (Col < numCColumns)) {
            for (int k = 0; k < TILE_WIDTH; k++) {
                cValue += ds_A[ty][k] * ds_B[k][tx];
            }
        }

        __syncthreads();
    }

    if ((Row < numCRows) && (Col < numCColumns)) {
        C[Row*numCColumns + Col] = cValue;
    }
}

Does Not work. check http://pastebin.com/dUvz56sR. Input multMatrix([[2,2], [3,3]], [[2,2], [3,3]]). Output: [[10, 10], [15,15]]. Expected output: [[8, 12], [12, 18]] — Ricardo Augusto, Jan 13 '13 at 01:42
There seems to work with matrices with different amounts of rows. Eg multMatrix([[2,2], [3,3]], [[2,2], [3,3], [4,4]]) — Ricardo Augusto, Jan 13 '13 at 01:46
Are you sure you are passing the right parameters through PyCuda? Can you share what output it produces for your example? These kernels work fine for me for many different matrices on Cuda. For your example, [[2, 2], [3, 3]] * [[2, 2], [3, 3]], the output should be [[10, 10], [15, 15]] and not [[8, 12], [8, 12]], how do you arrive at that result? — Charles Menguy, Jan 13 '13 at 01:51
I'm sorry. I ended up making a mess. I asked for an example of multiplication when actually been wanting another operation. Confuse everything. But the above snippet code helps me a lot. Thank you. — Ricardo Augusto, Jan 22 '13 at 01:18

Multiply two-dimensional matrices.‏ with pycuda

1 Answers1