2

The matrix-recursion of the n x n matrices Y_t looks like this:

Y_{t} = A + \sum_{i=1}^{p} B_{i} * Y_{t-i}

A and B are given.

This is my attempt, but it runs slowly:

Y = zeros(n,n,T); %Going to fill the 3rd dimension for Y_t, t=1:T
Y(:,:,1:p) = initializingY

for t=(p+1):T
    Y(:,:,t) = A;
    for i=1:p
        Y(:,:,t) = Y(:,:,t) + B(:,:,i)*Y(:,:,t-i);
    end
end

Can you think of a more efficient way to do this?

Divakar
  • 218,885
  • 19
  • 262
  • 358
stollenm
  • 348
  • 1
  • 2
  • 11
  • Could you please change your code so we can benchmark our propositions against your approach? I.e. a minimal data example. –  Nov 12 '15 at 14:48
  • As you can see, StackOverflow doesn't support TeX. Please edit your question to replace your expression with an image if it's needed. – horchler Nov 12 '15 at 15:17
  • @horchler That's fair enough for a small formula like this though, don't you think? – Jonathan H Nov 12 '15 at 15:20
  • @Sh3ljohn: As TeX is not required or supported for SO, attempting to use it clutters questions, decreases quality, and is unhelpful to users unfamiliar with TeX. There are [straightforward workarounds](http://meta.stackexchange.com/a/164286). – horchler Nov 12 '15 at 15:29
  • @horchler I couldn't use a workaround because I didn't have the required reputation to post images – stollenm Nov 12 '15 at 20:55

2 Answers2

5

You can kill the inner loop with matrix-multiplication after some reshaping & permuting, like so -

Y = zeros(n,n,T);
%// Y(:,:,1:p) = initializingY
for t=(p+1):T
    Br = reshape(B(:,:,1:p),n,[]);
    Yr = reshape(permute(Y(:,:,t-(1:p)),[1 3 2]),[],n);
    Y(:,:,t) = A + Br*Yr;
end
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • I was trying to do exactly this; it might not seem trivial to everyone that you can actually multiply the "long-side" of `B` once resized. This is because the sum in the recursion effectively adds up matrix elements along the third dimension. That way we essentially merge the sums due to matrix multiplication on the one hand, and due to recursion on the other. – Jonathan H Nov 12 '15 at 15:19
  • @Sh3ljohn Exactly! There are reductions at two dims - One with the original `matrix-mul` and one because of the recursion, that is being done with one `matrix-mul` after that reshaping. It was quite a question, a good one really! – Divakar Nov 12 '15 at 15:20
  • @Divakar Thank you very much. This is amazing! Can you think of a similar way to do this for a recursion in quadratic form like this: Y_t= A + \sum_{i=1}^{p} B_i * Y_(t-i) * B_i ' ? – stollenm Nov 12 '15 at 21:48
3

Short of using clever mathematical tricks to reduce the number of operations, the best shot is to optimize the memory access. That is: avoid subsrefing, increase the locality of your code, reduce the cache misses by manipulating short arrays instead of large ones.

n = 50;
T = 1000;
p = 10;


Y = zeros(n,n,T);
B = zeros(n,n,p);
A = rand(n);
for t = 1:p
        Y(:,:,t) = rand(n);
        B(:,:,t) = rand(n);
end

fprintf('Original attempt: '); tic;
for t=(p+1):T
        Y(:,:,t) = A;
        for k=1:p
                Y(:,:,t) = Y(:,:,t) + B(:,:,k)*Y(:,:,t-k);
        end;
end;
toc;

%'This solution was taken from Divakar'
fprintf('Using reshaping: '); tic;
Br = reshape(B(:,:,1:p),n,[]);
for t=(p+1):T
        Yr = reshape(permute(Y(:,:,t-(1:p)),[1 3 2]),[],n);
        Y(:,:,t) = A + Br*Yr;
end;
toc;

%'proposed solution'
Y = cell(1,T);
B = cell(1,p);
A = rand(n);
for t = 1:p
        Y{t} = rand(n);
        B{t} = rand(n);
end

fprintf('Using cells: '); tic;
for t=(p+1):T
        U = A;
        for k=1:p
                U = U + B{k}*Y{t-k};
        end;
        Y{t} = U;
end;
toc;

For setups given in my example I get a two-fold speed increase for a decent machine (i5 + 4Gb, MATLAB R2012a). I am curious how well it does on your machine.

  • i5 2.7 GHz, 8GB DDR3, Matlab R2015a: 0.11 vs 0.06 sec on average. – Jonathan H Nov 12 '15 at 15:29
  • @Sh3ljohn I got quite similar results (0.12 vs 0.065). Thanks for confirming. :-) –  Nov 12 '15 at 15:31
  • Wow, this is really impressive, your solution works much better than using clever tricks actually! – Jonathan H Nov 12 '15 at 15:40
  • 1
    @Sh3ljohn IMHO, when no algorithmic complexity reduction in sight, the first effort should be to help the hardware help you. However, for this case I think that there will be no increase in speed from matrix sizes above cache size. –  Nov 12 '15 at 15:45
  • :) I just re-edited sorry I stupidly tested without actually implementing the reshaping solution; it is actually doing better than using cells, but it's still quite impressive that you can gain so much time in a high-level language like Matlab, good to know! – Jonathan H Nov 12 '15 at 15:49
  • @CST-Link Thanks for taking the time to write the benchmark code. I will post in that fashion from now on. Your code is easily readable and amazingly quick! – stollenm Nov 12 '15 at 21:09