How do you use cblas_dgemm to do a vector outer product?

Question

I am trying to do a Column Vector multiplication with a Row Vector. Can I use dgemm?

In other words D = A * B where D is a Matrix, A is a Column Vector and B is a Row Vector.

I followed the documentation found here https://software.intel.com/en-us/node/520775. I cannot seem to get the parameters right for cblas_dgemm

Here is my try. In my case m = nRows, n = nCols, k = 1

The problem seems to be lda, ldb and ldc. I have defined them respectively as nCols, k, nRows.

#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>

#include <math.h>
#include <mkl.h>

#define nCols 5
#define nRows 20
#define k 1

void PrintMatrix(double* pMatrix, const size_t nR, const size_t nC, const CBLAS_ORDER Order) {
    unsigned int i, j;
    if (Order == CblasRowMajor)
    {
        for (i = 0; i < nR; i++)
        {
            for (j = 0; j < nC; j++)
            {
                printf("%f \t ", pMatrix[i * nC + j]); // !!!
            }
            printf("\n"); // !!!
        }

    }

    else

    {
        for (i = 0; i < nR; i++) {
            for (j = 0; j < nC; j++) {
                printf("%f \t ", pMatrix[i + j* nR ]); // !!!
            }
            printf("\n"); // !!!
        }

    }
    printf("\n"); // !!!

}

int main(void) {

    double A[] = { 8, 4, 7, 3, 5, 1, 1, 3, 2, 1, 2, 3, 2, 0, 1, 1 , 2, 3, 4, 1};

    double B[] = { -1, 2, -1, 1, 2 };


    double alpha = 1.0, beta = 0.0;
    int i, lda, ldb, ldc;
    double *C, *D;
    D = (double*) malloc(nRows * nCols * sizeof(double));
    C = (double*) malloc(nRows * nCols * sizeof(double));

    for (i = 0; i < nRows*nCols; i++)
        D[i] = 0.0;
    for (i = 0; i < nRows*nCols; i++)
        C[i] = 0.0;

    lda = nCols;
    ldb = k;
    ldc = nRows;

    cblas_dger(CblasRowMajor, nRows, nCols, alpha, A, 1, B, 1, C, nCols);

    PrintMatrix(C, nRows, nCols,CblasRowMajor);
    cblas_dgemm (CblasRowMajor, CblasNoTrans, CblasNoTrans, nRows,  nCols, k, alpha, A, lda, B, ldb, beta, D, ldc);

    PrintMatrix(D, nRows, nCols, CblasRowMajor);

    free(D);
    free(C);

    return 0;
}

This solved the issue for me. https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/733745 — optimcode, Feb 15 '18 at 14:51

score 4 · Accepted Answer · edited May 23 '17 at 12:16

4

Short answer is, yes you can use dgemm for rank-1 update. The dger is of course suggested, since it is expected to be better optimized for this operation.

As far as the use of cblas_dgemm . As you know the definition of leading dimension is:

lda: The size of the first dimension of matrix A

The operation you are trying to perform is: D(20x5) = A(20x1) * B(1x5)

You are using CblasRowMajor so leading dimension the number of columns for all matrices (for explanation see https://stackoverflow.com/a/30208420/2707697). Meaning:

lda = 1;
ldb = 5;
ldc = 5;

edited May 23 '17 at 12:16

Community

1
1

answered May 16 '15 at 07:23

ztik

3,482
16
34

Many thanks ctheo. I tried that, however I am getting the same issues. it is crashing. – optimcode May 16 '15 at 15:38
1

@themaze I checked the program and works fine on my pc. Check the `cblas.h` you include to your source. How is defined the `CblasRowMajor` and `CblasNoTrans`? – ztik May 18 '15 at 08:29
I tested this on a different linux distro. Fedora and it worked :). Thanks – optimcode May 18 '15 at 18:21
@themaze it seems that these variables are carelessly implemented in some libraries. – ztik May 18 '15 at 18:48
This is a follow up question. For some other reasons I am switched back to OpenSuSE 42.2 and the code is giving a core dump. Albeit it worked OK in Fedora 23-24. I am trying to get what is wrong. Do you have any ideas? I am using MKL 11.3 – optimcode Jan 19 '17 at 01:56
@themaze this is a rather simple example and there is no obvious issue on it. I would suggest to debug by using some other BLAS library – ztik Jan 19 '17 at 08:10
Yes it's a simple issue. I am really surprised that the same code gives a core dump on one platform but works well on a second platform. – optimcode Jan 19 '17 at 11:59

score 1 · Answer 2 · answered May 17 '15 at 23:50

The following code works. I switched to column-major because it's easier to understand the leading-dimension issues in the context of the Fortran BLAS documentation. I apologize that I was not able to solve your issue precisely as posed

#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>

#include <math.h>
#include <mkl.h>

void PrintMatrix(double* pMatrix, const size_t nR, const size_t nC, const CBLAS_ORDER Order) {
    unsigned int i, j;
    if (Order == CblasRowMajor) {
        for (i = 0; i < nR; i++) {
            for (j = 0; j < nC; j++) {
                printf("%f \t ", pMatrix[i * nC + j]); // !!!
            }
            printf("\n"); // !!!
        }
    } else {
        for (i = 0; i < nR; i++) {
            for (j = 0; j < nC; j++) {
                printf("%f \t ", pMatrix[i + j* nR ]); // !!!
            }
            printf("\n"); // !!!
        }
    }
    printf("\n"); // !!!
}

int main(void)
{
    const int m = 20;
    const int n = 5;
    const int k = 1;

    double A[] = { 8, 4, 7, 3, 5, 1, 1, 3, 2, 1, 2, 3, 2, 0, 1, 1, 2, 3, 4, 1};
    double B[] = { -1, 2, -1, 1, 2 };

    double alpha = 1.0, beta = 0.0;

    double * C = (double*) malloc(m * n * sizeof(double));
    double * D = (double*) malloc(m * n * sizeof(double));

    for (int i = 0; i < m*n; i++) C[i] = 0.0;
    for (int i = 0; i < m*n; i++) D[i] = 0.0;

    int lda = 20;
    int ldb = 1;
    int ldc = 20;

    cblas_dger(CblasColMajor, m, n, alpha, A, 1, B, 1, C, ldc);

    cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, 
                m, n, k, 
                alpha, A, lda, 
                B, ldb, beta, 
                D, ldc);

    PrintMatrix(C, m, n, CblasRowMajor);
    PrintMatrix(D, m, n, CblasRowMajor);

    free(D);
    free(C);

    return 0;
}

Thanks Jeff. Strangely cblas_dgemm is not running on my system. — optimcode, May 18 '15 at 02:03
What version of MKL? Platform info? You might try another implementation (GSL, ATLAS, OpenBLAS) just to verify. — Jeff Hammond, May 18 '15 at 02:12
I am using the latest MKL version 11.2. I will try OpenBlas and GSL. Problem is that I am using vdMul, which seems to only be in MKL. do you know any efficient way to do element wise multiplication? — optimcode, May 18 '15 at 02:38
You need to be specific. For me, "latest" means the nightly engineering build ;-) — Jeff Hammond, May 18 '15 at 02:39
Element-wise matrix multiplication is bandwidth limited and should be fine to implement with loops. If you notice vdmul running significantly faster, post that issue and I'll try to help. — Jeff Hammond, May 18 '15 at 02:55
This eventually solved teh problem for me. https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/733745 — optimcode, Feb 15 '18 at 14:52

How do you use cblas_dgemm to do a vector outer product?

2 Answers2