I have two arrays A
and B
of the same dimension 1000 x 3 x 20 x 20
. I want to generate a third array C
of dimension 3 x 3 x 20 x 20
that would be an outcome of matrix multiplication of corresponding slices of A
and B
, i.e. C(:,:,i,j) = A(:,:,i,j)'*B(:,:,i,j)
. Then I need to transform array C
to the new array D
by inverting the corresponding 3 x 3
matrices, i.e. D(:,:,i,j) = inv(C(:,:,i,j))
. Again, it's clear how to do this with loops. is there a way to awoid looping over 400
items?
Edit: The benchmarking code to compare performance of different solutions would be -
%// Inputs
n1 = 50;
n2 = 200;
A = rand(n1,3,n2,n2);
B = rand(n1,3,n2,n2);
%// A. CPU loopy code
tic
C = zeros(3,3,n2,n2);
for ii = 1:n2
for jj = 1:n2
C(:,:,ii,jj) = A(:,:,ii,jj)'*B(:,:,ii,jj); %//'
end
end
toc
%// B. Vectorized code (using squeeze)
tic
C1 = squeeze(sum(bsxfun(@times,permute(A,[2 1 5 3 4]),permute(B,[5 1 2 3 4])),2));
toc
%// C. Vectorized code (avoiding squeeze)
tic
C2 = sum(bsxfun(@times,permute(A,[2 5 3 4 1]),permute(B,[5 2 3 4 1])),5);
toc
%// D. GPU vectorized code
tic
A = gpuArray(A);
B = gpuArray(B);
C3 = sum(bsxfun(@times,permute(A,[2 5 3 4 1]),permute(B,[5 2 3 4 1])),5);
C3 = gather(C3);
toc
Runtime results -
Elapsed time is 0.287511 seconds.
Elapsed time is 0.250663 seconds.
Elapsed time is 0.337628 seconds.
Elapsed time is 1.259207 seconds.