3

I've been working on a rather extensive program as of late, and i'm currently at a point where I have to utilize matrix multiplication. Thing is, for this particular program, speed is crucial. I'm familiar with a number of matrix setups, but I would like to know which method will run the fastest. I've done extensive research, but turned up very little results. Here is a list of the matrix multiplication algorithms I am familiar with:

  • Iterative algorithm
  • Divide and Conquer algorithm
  • Sub Cubic algorithms
  • Shared Memory Parallelism

If anyone needs clarification on the methods I listed, or on the question in general, feel free to ask.

  • 1
    A: Hand-tuned libraries developed by specialists with detailed knowledge and experience of the architecture of the processor on which the code will be executing; in other words don't roll your own, beg borrow or steal an implementation. Oh, or actually buy one. – High Performance Mark Nov 05 '15 at 16:07
  • 1
    This question is too broad. You matrix can be big, small, sparse, dense... There is no best algorithm for every contexts. Note that shared memory parallelism is not a algorithm and there are algorithms which behave better or worst depending on the parallel architectures you are on. – coincoin Nov 05 '15 at 16:12
  • 1
    Have a look at a [related post](http://stackoverflow.com/questions/4455645/what-is-the-best-matrix-multiplication-algorithm?rq=1) – Axel Kemper Nov 05 '15 at 16:17

2 Answers2

3

The Strassen algorithm and the naive (O(n^3)) one are the most used in practice.

More complex algorithms with tighter asymptotic bounds are not used because they benefits would be apparent only for extremely large matrices, due to their complexity, e.g. Coppersmith algorithm.

As others pointed out you might want to use a library like ATLAS which will automatically tune the algorithm depending on the characteristcs of the platform where you are executing, e.g. L1/L2 cache sizes.

igon
  • 3,016
  • 1
  • 22
  • 37
  • Je suis UN pamplemousse ! – coincoin Nov 05 '15 at 16:19
  • What evidence of the first statement in this answer can OP (or anyone else) provide ? – High Performance Mark Nov 05 '15 at 17:10
  • There are a lot of papers studying the subject. Looking at [this](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6121273) it seems like Strassen achieves significant speedup for large matrices while the benefits of Coppersmith are very limited. Strassen itself might not be convenient at times on systems with complex memory hierarchies; [this](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1592249) paper suggests adding an additional auto-tuning step to ATLAS to automatically decide whether to use Strassen or the naive method for DGEMM. – igon Nov 05 '15 at 17:23
  • *There are a lot of papers studying the subject* There certainly are, but little evidence that the widely used libraries for matrix computations actually employ the Strassen algorithm. As you write *Strassen itself might not be convenient at times on systems with complex memory hierarchies* and that just about covers most current CPU-based computers. – High Performance Mark Nov 05 '15 at 18:58
  • Sure, although the key point was that even for a fixed platform the best method also depends on the size of the multiplication. Anyway I changed my answer to reflect the fact that the jury is still out on this one. – igon Nov 05 '15 at 21:31
2

Quickest way might be using an existing library that's already optimized, you don't have to reinvent the wheel every time.

Mathieu Borderé
  • 4,357
  • 2
  • 16
  • 24