Why does BLAS have a gemm
function for matrix-matrix multiplication and a separate gemv
function for matrix-vector multiplication? Isn't matrix-vector multiplication just a special case of matrix-matrix multiplication where one matrix only has one row/column?

- 67,514
- 53
- 213
- 334
-
3[dgemm](http://www.netlib.org/blas/dgemm.f) and [dgemv](http://www.netlib.org/blas/dgemv.f): F77 double versions of the discussion functions for the curious. Also just wanna inject that is a really important (and often used) special case where special optimizations might be possible even if that doesn't show in the f77 versions. – user786653 Aug 15 '11 at 16:48
-
also interesting to compare performance of gemm and gemv for vector-matrix multiplication. – constructor Dec 30 '15 at 10:20
3 Answers
Mathematically, matrix-vector multiplication is a special case of matrix-matrix multiplication, but that's not necessarily true of them as realized in a software library.
They support different options. For example, gemv
supports strided access to the vectors on which it is operating, whereas gemm
does not support strided matrix layouts. In the C language bindings, gemm
requires that you specify the storage ordering of all three matrices, whereas that is unnecessary in gemv
for the vector arguments because it would be meaningless.
Besides supporting different options, there are families of optimizations that might be performed on gemm
that are not applicable to gemv
. If you know that you are doing a matrix-vector product, you don't want the library to waste time figuring out that's the case before switching into a code path that is optimized for that case; you'd rather call it directly instead.

- 103,815
- 19
- 183
- 269
-
2gemm uses the `lda, ldb, ldc` arguments which are the row/column strides and with them you can express the same thing for a column matrix as the `inc` parameter when passing a vector. So it ends up equivalent. – bluss Mar 16 '16 at 22:18
When you optimize gemv and gemm different techniques apply:
- For the matrix-matrix operation you are using blocked algorithms. Block sizes depend on cache sizes.
- For optimising the matrix-vector product you use so called fused Level 1 operations (e.g. fused dot-products or fused axpy).
Let me know if you want more details.

- 2,934
- 2
- 17
- 19
-
1is it possible to say, gemv() in most cases has better performance than gemm() ? – constructor Dec 30 '15 at 10:14
-
2Yes, for an actual matrix-vector product gemv has better performance (assuming you don't compare a bad gemv implementation with a good gemm implementation). Having said that, with a gemv operation you never can achieve peak performance. So the trick for in numerical linear algebra is finding algorithmic variants (so called block algorithms) that utilise matrix-matrix products. – Michael Lehn Jan 01 '16 at 17:09
I think it just fits the BLAS hierarchy better with its level 1 (vector-vector), level 2 (matrix-vector) and level 3 (matrix-matrix) routines. And it maybe optimizable a bit better if you know it is only a vector.

- 45,360
- 10
- 108
- 185