Some discussion and code
The trick here is to perform the norm-calculations
with numeric arrays and save the results into a cell array version as late as possible. For performing the norm-calculations
you can take help of ndgrid
, bsxfun
and some permute + reshape
to give it the "shape" as needed for the final cell array version. So, here's the vectorized approach to perform these tasks -
%// Create x-y/i-j values to be used for calculation of function values
[xi,yi] = ndgrid(1:w,1:h);
%// Get the norm values
normvals = sqrt(bsxfun(@minus,xi(:),xi(:).').^2 + ...
bsxfun(@minus,yi(:),yi(:).').^2);
%// Get the actual function values
vals = exp(-normvals.^2/sigma^2);
%// Get the values into blocks of a 4D array and then re-arrange to match
%// with the shape of numeric array version of X
blks = reshape(permute(reshape(vals, w*h, h, []), [2 1 3]), h, w, h, w);
arranged_blks = reshape(permute(blks,[2 3 1 4]),w,h,w,h);
%// Finally get the cell array version
X = squeeze(mat2cell(arranged_blks,w,h,ones(1,w),ones(1,h)));
Benchmarking and runtimes
After improving the original loopy code with pre-allocation for X
and function-inling f
, runtime-benchmarks
were performed with it against the proposed vectorized approach with datasizes as w, h = 60
and the runtime results thus obtained were -
----------- With Improved loopy code
Elapsed time is 41.227797 seconds.
----------- With Vectorized code
Elapsed time is 2.116782 seconds.
This suggested a whooping close to 20x
speedup with the proposed solution!
For extremely huge datasizes
If you are dealing with huge datasizes, essentially you are not giving enough memory for bsxfun
to work with, and bsxfun
is known to use up a lot of memory for giving you a performance-efficient vectorized solution. So, for such huge-datasize cases, you can use the following loopy approach to replace normvals
calculations that was listed in the earlier bsxfun
based solution -
%// Get the norm values
nx = numel(xi);
normvals = zeros(nx,nx);
for ii = 1:nx
normvals(:,ii) = sqrt( (xi(:) - xi(ii)).^2 + (yi(:) - yi(ii)).^2 );
end