8

I've searched quite a bit, but I've only found homegrown reimplementations of Strassen matrix multiplication.

Wikipedia says that numpy uses BLAS (which includes a high-performance implementations of sub-cubic matrix multiplication algorithms, e.g. Strassen's method), but I couldn't find if numpy matrix multiplication has some sort of size check and then chooses either naive $O(n^3)$ multiplication or some more sophisticated approach based on the size of the matrix (this is similar to choosing the leaf size or recursion limit in Strassen sub-calls).

I also tried just plotting the log of the runtime vs. the log of the problem size, but looking for a subtle change in slope is non-trivial (because of cache effects, etc. as the problems get larger).

Since the documentation for numpy matrix didn't have any mention of Strassen (or alternative sub-cubic algorithm) or the runtime, and since numpy source in question is in C++ for performance (the C++ code in turn uses the BLAS library), it isn't so easy to tell from the source code, so I thought I would ask:

Does anyone know about the algorithm or big-oh runtime of a numpy.matrix(...) * numpy.matrix(...) call?

smci
  • 32,567
  • 20
  • 113
  • 146
user
  • 7,123
  • 7
  • 48
  • 90
  • 2
    I think it is not available since it reduces the numerical stability...however I have not found a reliable source for this hypothesis. Moreover there are faster algorithms like the Coppersmith–Winograd algorithm. The standard algorithm has O(n^3), Strassen about O(n^2.8) and Coppersmith–Winograd has roughly O(n^2.4). As far as I remember the Coppersmith–Winograd is also more stable. – daniel451 Nov 02 '15 at 07:02
  • 3
    [I don't think Coppersmith-Winograd has actually been implemented](http://mathoverflow.net/a/101532) – zehnpaard Nov 02 '15 at 07:46
  • As you already found out, `numpy` nor `scipy` are actually implementing low level math routines, but use existing blas libraries. So basically the question depends on the blas implementation you are using. That said, I doubt that any of the high performance math routine implementations uses a naive algorithmus :) – cel Nov 02 '15 at 08:01
  • 2
    Exactly which algorithm gets used would (in principle) depend on which BLAS library numpy is linked against, rather than on numpy itself. In practice I'm not aware of any BLAS implementation that uses Strassen multiplication, presumably because of the high memory cost and relatively poor numerical stability. – ali_m Nov 02 '15 at 18:47
  • Has anyone found an answer or has some advice on either Coppersmith-Winograd or Williams matmul? I dont care about numerical stability, I'm allowed to fail some of the time – mhnatiuk May 22 '20 at 20:29

0 Answers0