3

I need to make a good implementation for matrix multiplication better than the naive method here is the methods i used : 1- removed false dependencies which made the performance a lot better 2- used a recursive approach and then there is something i need to try loop unrolling. The thing is each time i used it , it makes the performance worst i can't find an explanation for it i need help here is the code

 for (i = 0; i < M; i++)
    for (j = 0; j < N; j++) {
    double sum = 0;
        #pragma unroll(5)
          for (k = 0; k < K; k++)
        {
        sum +=  A[i + k*LDA] * B[k + j*LDB];
        }
        C[i + j*LDC] = sum ;
    }
Kazimierz Jawor
  • 18,861
  • 7
  • 35
  • 55
  • Why do you need to make an good implementation by your self? Are you disallowed to use existing solutions? Are you aware of how in detail understanding of memory will affect the outcome (see http://people.freebsd.org/~lstewart/articles/cpumemory.pdf). Have you seen http://eigen.tuxfamily.org/index.php?title=Main_Page? – Cisum Inas Nov 30 '13 at 21:41
  • 1
    i want to make a good implementation by myself because am following this course http://www.cs.berkeley.edu/~yelick/cs194f07/ and i want to learn it am doing the second assignment right now matrix multiplication so i kind of want to implement it my self – Ahmed Abdel Moneim Elket Nov 30 '13 at 21:52
  • Ok :) Just out of curiosity does the performance change if you change variables on the two inner variables like: http://jsfiddle.net/sW5hM/ , I remember when working a lot with c/c++ matrix implementations that used to help because of issues in relation with memory. – Cisum Inas Nov 30 '13 at 22:01
  • haha yes it makes a difference if the compiler's flag to remove false dependencies and stuff like that is not enabled when i enable that flag the time is improved but that variable doesnt make a difference anymore hahaha – Ahmed Abdel Moneim Elket Dec 01 '13 at 09:30

0 Answers0