Efficient Algorithms for Computing a matrix times its transpose

Question

For a class, a question that was posed by my teacher was the algorithmic cost of multiplying a matrix times its transpose. With the standard 3 loop matrix multiplication algorithm, the efficiency is O(N^3), and I wonder if there was a way to manipulate or take advantage of matrix * matrix transpose to get a faster algorithm. I understand that when you multiply a matrix by its transpose you have to calculate less of the matrix because its symmetrical, but I can't think of how to manipulate an algorithm that could take less than O(n^3).

i know there's algorithms like Coppensmith and Straussen that are faster general matrix multiplication algorithms but could anyone give any hints or insights on how to computationally take advantage of the transpose?

Thanks

Well, A^TA is symmetric, so you can only compute one half of the result. — n. m. could be an AI, Jan 16 '12 at 22:18
I googled and found a related question on Mathoverflow http://math.stackexchange.com/questions/28395/is-it-faster-to-multiply-a-matrix-by-its-transpose-than-ordinary-matrix-multipli but seems that no one has a better idea. — Pinch, Apr 10 '12 at 09:16
I know this is an old question, but I have actually implemented Straussen's method. While it is theoretically faster for multiplying large sparse matrices, in real life it is not. The additional overhead of the recursive function calls (or any trickery used to simulate the recursion without using the stack) will result in computational losses that exceed the very modest gains that the algorithm itself provides. I am unfamiliar with Coppensmith, so I can't comment on that. — Pete, May 05 '14 at 15:49

score 2 · Answer 1 · answered Jun 06 '12 at 22:04

As of right now there aren't any aymptotic barrier-breaking properties of this particular multiplication.

The obvious optimization is to take advantage of the symmetry of the product. That is to say, the [i][j]th entry is equal to the [j][i]th entry.

For implementation-specific optimizations, there is a significant amount of caching that you can do. A very significant amount of time in the multiplication of large matrices is spent transferring data to and from memory and CPU. So CPU designers implemented a smart caching system whereby recently used memory is stored in a small memory section called the cache. In addition to that, they also made it so that nearby memory is also cached. This is because a lot of the memory IO is due to reading/writing from/to arrays, which are stored sequentially.

Since the transpose of a matrix is simply the same matrix with the indices swapped, caching a value in the matrix can have over twice the impact.

Efficient Algorithms for Computing a matrix times its transpose

1 Answers1

Linked

Related