0

Here's an example of sgemm program

#include <mkl.h>
#include <iostream>
#include <cstdlib>
#define ITERATION 1

int main()
{
  int ra = 128;
  int lda = 75;
  int ldb = 55;
  float* left = (float*)calloc(ra * lda, sizeof(float));
  float* right = (float*)calloc(ldb * lda, sizeof(float));
  float* ans = (float*)calloc(ra * ldb, sizeof(float));
  std::cout << "left " << std::endl;
  for (int i = 0; i < ra; ++i) {
    for (int j = 0; j < lda; ++j) {
      left[i * lda + j] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
      std::cout << left[i * lda + j] << " ";
    }
    std::cout << std::endl;
  }

  std::cout << "right " << std::endl;
  for (int i = 0; i < lda; ++i) {
    for (int j = 0; j < ldb; ++j) {
      right[i * ldb + j] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
      std::cout << right[i * ldb + j] << " ";
    }
    std::cout << std::endl;
  }

  for (int i = 0; i < ITERATION; ++i) {
    cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, ra, ldb, lda, 1.0f, left, lda,
      right, ldb, 0.0f, ans, ldb);
  }

  std::cout << "ans " << std::endl;
  for (int i = 0; i < ra; ++i) {
    for (int j = 0; j < ldb; ++j) {
      std::cout << ans[i * ldb + j] << " ";
    }
    std::cout << std::endl;
  }

  return 0;
}

I compile this program with g++ by options -fopenmp -lmkl_rt, where OMP_NUM_THREADS has been set to 16.

After running the program, I figure out that the answer is exactly wrong comparing to the matlab result. I wouldn't say wrong if there are only few accuracy errors. Further, I observe that the program performs well under these conditions:

  1. Use icc instead of g++,
  2. Remove -fopenmp flag,
  3. Use g++&atlas instead of icc&mkl
  4. Set OMP_NUM_THREADS=1

Therefore, I guess the problem may lay on the -fopenmp flag. Can you help me figure out the problem? Thank you!

g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16)

icc (ICC) 16.0.3 20160415

Linux core 2.6.32-279.el6.x86_64

chain ro
  • 737
  • 2
  • 10
  • 21

1 Answers1

0

According to MKL link line advisor, you don't need to use -fopenmp with the single dynamic libray -lmkl_rt to enable multi-threading. As your gcc is old, this may be a problem.

You could try to use traditional dynamic linking and compare the following settings to see whose problem it is.

Threaded MKL + GNU OpenMP

Link options: -Wl,--no-as-needed -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_gnu_thread -lpthread -lm -ldl           
Compile options: -fopenmp -m64 -I${MKLROOT}/include

Threaded MKL + Intel OpenMP

Link options: -Wl,--no-as-needed -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -liomp5 -lpthread -lm -ldl
Compile options: -m64 -I${MKLROOT}/include
kangshiyin
  • 9,681
  • 1
  • 17
  • 29
  • Thank you! Although I still don't know why I get this error, after setting the linking configuration, the result is correct. By the way, I get this error even on GCC 4.8.4. – chain ro Jul 01 '16 at 12:11
  • @chainro maybe be mkl_rt's bug. it is new. – kangshiyin Jul 01 '16 at 12:19