Efficiency of diag() - MATLAB

Question

Motivation:

In writing out a matrix operation that was to be performed over tens of thousands of vectors I kept coming across the warning:

Requested 200000x200000 (298.0GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may take a long time and cause MATLAB to become unresponsive. See array size limit or preference panel for more information.

The reason for this was my use of diag() to get the values down the diagonal of an matrix inner product. Because MATLAB is generally optimized for vector/matrix operations, when I first write code, I usually go for the vectorized form. In this case, however, MATLAB has to build the entire matrix in order to get the diagonal which causes the memory and speed issues.

Experiment:

I decided to test the use of diag() vs a for loop to see if at any point it was more efficient to use diag():

num = 200000; % Matrix dimension
x = ones(num, 1);
y = 2 * ones(num, 1);

% z = diag(x*y'); % Expression to solve

% Loop approach
tic
z = zeros(num,1);
for i = 1 : num
   z(i) =  x(i)*y(i);
end
toc

% Dividing the too-large matrix into process-able chunks
fraction = [10, 20, 50, 100, 500, 1000, 5000, 10000, 20000];
time = zeros(size(fraction)); 
for k = 1 : length(fraction)
    f = fraction(k);

    % Operation to time
    tic
    z = zeros(num,1);
    for i = 1 : k
        first = (i-1) * (num / f);
        last = first + (num / f);
        z(first + 1 : last) = diag(x(first + 1: last) * y(first + 1 : last)');
    end
    time(k) = toc;
end

% Plot results
figure;
hold on
plot(log10(fraction), log10(chunkTime));
plot(log10(fraction), repmat(log10(loopTime), 1, length(fraction)));
plot(log10(fraction), log10(chunkTime), 'g*'); % Plot points along time
legend('Partioned Running Time', 'Loop Running Time');
xlabel('Log_{10}(Fractional Size)'), ylabel('Log_{10}(Running Time)'), title('Running Time Comparison');

This is the result of the test: (NOTE: The red line represents the loop time as a threshold--it's not to say that the total loop time is constant regardless of the number of loops)

From the graph it is clear that it takes breaking the operations down into roughly 200x200 square matrices to be faster to use diag than to perform the same operation using loops.

Question:

Can someone explain why I'm seeing these results? Also, I would think that with MATLAB's ever-more optimized design, there would be built-in handling of these massive matrices within a diag() function call. For example, it could just perform the i = j indexed operations. Is there a particular reason why this might be prohibitive?

I also haven't really thought of memory implications for diag using the partition method, although it's clear that as the partition size decreases, memory requirements drop.

I guess [this question](http://stackoverflow.com/questions/23681337/does-matlab-optimize-diagab?rq=1) talks about MATLAB's optimization of `diag` — marcman, Dec 22 '15 at 15:31
Also, using `sum(A.*B',2)` doesn't solve the `diag` conundrum when you're using non-square matrices (e.g `diag([m x 2] * [2 x 3] * [3 x 3] * [3 x n])`) — marcman, Dec 22 '15 at 16:00
Umm... I don't get it. In your question you only have vectors, and `diag(x*y')` is the same as `x(1:n).*y(1:n)` where `n=min(length(x),length(y))`. How do matrices come into the picture? The problem in both memory use and runtime is that for a `diag` vector of length `n` you're allocating and computing `n^2-n` elements which you promptly throw away. — Andras Deak -- Слава Україні, Dec 22 '15 at 16:09
Also: "`Also, I would think that with MATLAB's ever-more optimized design, there would be built-in handling of these massive matrices within a diag() function call. For example, it could just perform the i = j indexed operations. Is there a particular reason why this might be prohibitive?`": in an expression `diag(A*B)` the temporary matrix `C=A*B` is necessarily computed and allocated, then passed to `diag`. `diag` doesn't have a chance to ignore those elements: they are already computed by the time `diag` is called. — Andras Deak -- Слава Україні, Dec 22 '15 at 16:13
This is not an inner product. It's a dyadic product and the diagonals are elementwise multiplication `x.*y'` — percusse, Dec 22 '15 at 16:15
@percusse: I know this is not an inner product. However, for the test is serves the purpose of building a large matrix from which I am grabbing the diagonal. I'm just using a matrix inner product in my other work — marcman, Dec 22 '15 at 16:57
@AndrasDeak: I'm just using the vectors to form the matrix in the test. Is it really that hard for everyone to separate the problem that motivates a question from the question itself? — marcman, Dec 22 '15 at 16:58
I'm confused what this question is about: (1) efficiency of diag vs. writing your own loop? or (2) why can't I create a 290GB matrix? or (3) how can I do element wise matrix multiplication? Answer to (3) is use `.*`, and answer to (2) is because you don't have that much memory. — Matthew Gunn, Dec 22 '15 at 17:02
@MatthewGunn: I guess it's not as clear as I thought. I'm just trying to compare calculating along a diagonal using matrix/vector multiplication inside a `diag` call to calculating the diagonal products individually within a loop. — marcman, Dec 22 '15 at 17:06
@marcman my question is entirely not unrelated to your problem. You're trying to compare apples and oranges, and I suspect that you either need apples, *or* oranges, and one of the two is uncalled-for. — Andras Deak -- Слава Україні, Dec 22 '15 at 18:14

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

3

Test of speed of diag vs. a loop.

Initialization:

n = 10000;
M = randn(n, n);  %create a random matrix.

Test speed of diag:

tic;
d = diag(M);
toc;

Test speed of loop:

tic;
d = zeros(n, 1);
for i=1:n
   d(i) = M(i,i);
end;
toc;

This would test diag. Your code is not a clean test of diag...

Comment on where there might be confusion

Diag only extracts the diagonal of a matrix. If x and y are vectors, and you do d = diag(x * y'), MATLAB first constructs the n by n matrix x*y' and calls diag on that. This is why, you get the error, "cannot construct 290GB matrix..." Matlab interpreter does not optimize in a crazy way, realize you only want the diagonal and construct just a vector (rather than full matrix with x*y', that does not happen.

Not sure if you're asking this, but the fastest way to calculate d = diag(x*y') where x and y are n by 1 vectors would simply be: d = x.*y

edited Jun 20 '20 at 09:12

Community

1
1

answered Dec 22 '15 at 17:15

Matthew Gunn

4,451
1
12
30

Thank you. I understand it doesn't do that. As I asked in my OP, is there a particular reason why? It seems like that's an optimization that shouldn't be too difficult to implement. Is there something prohibitive that I'm missing? – marcman Dec 22 '15 at 17:18
@marcman diag is not a computational command. It is a matrix *mask* (roughly speaking). It cannot mask the diagonals without first forming the matrix. – percusse Dec 22 '15 at 17:23
@marcman It's insanely more problematic than you think. If you learn how just in time compilers work, how function calls actually work, how compiler optimizations work, etc... it's really not something feasible. – Matthew Gunn Dec 22 '15 at 17:24
Gotcha. Fair enough. That's basically the explanation what I was looking for – marcman Dec 22 '15 at 17:29
Also, to your last piece in the answer--yes `x.*y` would be better in that case. I was just using the outer product to build a test matrix – marcman Dec 22 '15 at 17:31
3

@marcman vector vector multiply `*` resulting in a matrix probably calls the BLAS function `dgemm.` You're asking MATLAB to dynamically figure out it can replace a call to dgemm and diag with a single call to [dsbmv](http://www.netlib.org/blas/dsbmv.f). There's basically NO way for Matlab to figure this out from the code of dgemm etc.. alone. Another approach might treat MATLAB code as some symbolic expression and then solve an optimization problem for the most efficient code to implement that expression: not easy! Anyway, there are insane problems going beyond specific hacks. – Matthew Gunn Dec 22 '15 at 17:46
Not to mention that in a call `diag(A*B)`, the called function never sees the original variables (`A` and `B`), only the resulting temporary array (`C=A*B`), which already has every element computed. – Andras Deak -- Слава Україні Dec 22 '15 at 18:25
@MatthewGunn: Thank you for the detailed explanation. That was exactly the type of answer I was looking for. – marcman Dec 23 '15 at 04:08

Efficiency of diag() - MATLAB

1 Answers1

Test of speed of diag vs. a loop.

Comment on where there might be confusion