Multiplication of corresponding 2d slices of two arrays and inversion of array slices

Question

I have two arrays A and B of the same dimension 1000 x 3 x 20 x 20. I want to generate a third array C of dimension 3 x 3 x 20 x 20 that would be an outcome of matrix multiplication of corresponding slices of A and B, i.e. C(:,:,i,j) = A(:,:,i,j)'*B(:,:,i,j). Then I need to transform array C to the new array D by inverting the corresponding 3 x 3 matrices, i.e. D(:,:,i,j) = inv(C(:,:,i,j)). Again, it's clear how to do this with loops. is there a way to awoid looping over 400 items?

Edit: The benchmarking code to compare performance of different solutions would be -

%// Inputs
n1 = 50;
n2 = 200;
A = rand(n1,3,n2,n2);
B = rand(n1,3,n2,n2);

%// A. CPU loopy code
tic
C = zeros(3,3,n2,n2);
for ii = 1:n2
    for jj = 1:n2
        C(:,:,ii,jj) = A(:,:,ii,jj)'*B(:,:,ii,jj); %//'
    end
end
toc

%// B. Vectorized code (using squeeze)
tic
C1 = squeeze(sum(bsxfun(@times,permute(A,[2 1 5 3 4]),permute(B,[5 1 2 3 4])),2));
toc

%// C. Vectorized code (avoiding squeeze)
tic
C2 = sum(bsxfun(@times,permute(A,[2 5 3 4 1]),permute(B,[5 2 3 4 1])),5);
toc

%// D. GPU vectorized code
tic
A = gpuArray(A);
B = gpuArray(B);
C3 = sum(bsxfun(@times,permute(A,[2 5 3 4 1]),permute(B,[5 2 3 4 1])),5);
C3 = gather(C3);
toc

Runtime results -

Elapsed time is 0.287511 seconds.
Elapsed time is 0.250663 seconds.
Elapsed time is 0.337628 seconds.
Elapsed time is 1.259207 seconds.

take a look here http://www.mathworks.com/matlabcentral/fileexchange/8773-multiple-matrix-multiplications--with-array-expansion-enabled — Nishant, Oct 28 '14 at 12:23
Wow! That's some useful and interesting runtime results. Thanks! — Divakar, Oct 28 '14 at 15:06
Also, one more thing I forgot to mention was that you need to "Warmup" GPU before benchmarking. So, the simplest way to achieve that would be to run the benchmarking code as it is and run it again and observe the runtimes in that second run. The trusted method to benchmark GPU codes is with `gputimeit`, but that would complicate the codes, so staying away from that for now. — Divakar, Oct 28 '14 at 15:21

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

1

Code

%// Part - 1
C = sum(bsxfun(@times,permute(A,[2 5 3 4 1]),permute(B,[5 2 3 4 1])),5);

%// Part - 2: Use MATLAB file-exchange tool multinv
D = multinv(C);

The function code for multinv is available here and it claims to be pretty efficient.

For the first part, you can also try out this -

C = squeeze(sum(bsxfun(@times,permute(A,[2 1 5 3 4]),permute(B,[5 1 2 3 4])),2));

This one seems to be re-arranging the elements not as "disruptively" as the one mentioned in the code above, but the downside is the need for squeeze that might slow it down a bit. I would leave it to you and also encourage you to benchmark and select the better one.

Why `bsxfun` + `GPU`?

I have increased the loop limits, as that could be a real test between a loopy code and a vectorized code. So, here is the modified code for part 1 -

%// Inputs
n1 = 50;
n2 = 200;
A = rand(n1,3,n2,n2);
B = rand(n1,3,n2,n2);

%// A. CPU loopy code
tic
C = zeros(3,3,n2,n2);
for ii = 1:n2
    for jj = 1:n2
        C(:,:,ii,jj) = A(:,:,ii,jj)'*B(:,:,ii,jj); %//'
    end
end
toc

%// B. GPU vectorized code
tic
A = gpuArray(A);
B = gpuArray(B);
C1 = sum(bsxfun(@times,permute(A,[2 5 3 4 1]),permute(B,[5 2 3 4 1])),5);
C1 = gather(C1);
toc

The runtime results at my system were -

Elapsed time is 0.310056 seconds.
Elapsed time is 0.172499 seconds.

So, you see!

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 28 '14 at 13:12

Divakar

218,885
19
262
358

I still lack the intuition for how the bsxfun works with multidimensional arrays. So thank you very much for answering a similar question one more time and thanks for referring to multinv. – Laimond Oct 28 '14 at 13:24
1

@Laimond The play about `bsxfun` works very closely with `permute` that is used most of times (when working with `bsxfun`) to create singleton dimensions, which are expanded by `bsxfun` internally before performing the operations mentioned by the handle `@`. It takes time, but play around with it really! – Divakar Oct 28 '14 at 13:26
2

@Laimond Also, if you have a decent GPU, you would see the crazy benefits of working with bsxfun. – Divakar Oct 28 '14 at 13:27
It seems that the second solution is slightly faster. My GPU is a very basic one, GeForce 410M. Do you mean some kind of parallel computing, or you mean that if my GPU was better, the code would run faster? – Laimond Oct 28 '14 at 14:35
@Laimond Edited the solution with the GPU codes. I would be dearly interested in seeing results at your end too if you don't mind being bothered on this. – Divakar Oct 28 '14 at 15:01
The solution with `squeeze` function is the fastest on my system. I need to update my workstation... – Laimond Oct 28 '14 at 15:20
@Laimond BTW mine was with GTX 750 Ti and that could have made the difference I believe. – Divakar Oct 28 '14 at 15:24

Multiplication of corresponding 2d slices of two arrays and inversion of array slices

1 Answers1

Why bsxfun + GPU?

Why `bsxfun` + `GPU`?