Seeing how your matrices are generally quite sparse, the end result of the tensor product shouldn't take that much memory. This is one of the cases when vectorization simply cannot be done due to the huge memory requirements of the intermediate computations, but it just might be possible using loops (and partial vectorization).
Note that this is a "better than nothing, but not by much" type of a solution.
I'll be using the ndSparse
submission, as it makes it easier to work with sparse matrices.
% Create some matrices
[A,B] = deal(sparse(1000,1000));
A(randperm(1000^2, 10000)) = randn(1E4, 1);
B(randperm(1000^2, 10000)) = randn(1E4, 1);
A = ndSparse(A); B = ndSparse(B);
% Kronecker tensor product, copied from kron.m
[ma,na] = size(A);
[mb,nb] = size(B);
A = reshape(A,[1 ma 1 na]);
B = reshape(B,[mb 1 nb 1]);
% K = reshape(A.*B,[ma*mb na*nb]); % This fails, too much memory.
K = ndSparse(sparse(mb*ma*nb*na,1),[mb, ma, nb, na]); % This works
From here one can proceed depending on the available memory:
% If there's plenty of memory (2D x 1D -> 3D):
for ind1 = 1:mb
K(ind1,:,:,:) = bsxfun(@times, A, B(ind1, :, :, :));
end
% If there's less memory (1D x 1D -> 2D):
for ind1 = 1:mb
for ind2 = 1:ma
K(ind1,ind2,:,:) = bsxfun(@times, A(:, ind2, :, :), B(ind1, :, :, :));
end
end
% If there's even less memory (1D x 0D -> 1D):
for ind1 = 1:mb
for ind2 = 1:ma
for ind3 = 1:nb
K(ind1,ind2,ind3,:) = bsxfun(@times, A(:, ind2, :, :), B(ind1, :, ind3, :));
end
end
end
% If there's absolutely no memory (0D x 0D -> 0D):
for ind1 = 1:mb
for ind2 = 1:ma
for ind3 = 1:nb
for ind4 = 1:na
K(ind1,ind2,ind3,ind4) = A(:, ind2, :, ind4) .* B(ind1, :, ind3, :);
end
end
end
end
K = sparse(reshape(K,[ma*mb na*nb])); % Final touch
So this is just a theoretical demonstration of how to eventually perform the computation, but unfortunately it is very inefficient since it has to call class methods over and over again, and also it doesn't guarantee that there will be enough memory to evaluate the \
operator.
One possible way to improve this is by calling matlab.internal.sparse.kronSparse
in some blockwise manner and store the intermediate results in the correct position of the output array, but this should require some careful thought.
BTW, I tried using the FEX submission that Ander mentioned (KronProd
) but it provides no benefit when you need to compute kron(A,B) + kron(C,D)
(although it's amazing for kron(A,B)\b
situations).