2

I have a scalar function f([x,y],[i,j])= exp(-norm([x,y]-[i,j])^2/sigma^2) which receives two 2-dimensional vectors as input (norm here implements the Euclidean norm). The values of x,i range in 1:w and the values y,j range in 1:h. I want to create a cell array X such that X{x,y} will contain a w x h matrix such that X{x,y}(i,j) = f([x,y],[i,j]). This can obviously be done using 4 nested loops like so:

for x=1:w;
for y=1:h;
    X{x,y}=zeros(w,h);
    for i=1:w
        for j=1:h
            X{x,y}(i,j)=f([x,y],[i,j])
        end 
    end
end
end

This is however extremely inefficient. I would very much appreciate an efficient way to create X.

Divakar
  • 218,885
  • 19
  • 262
  • 358
Serpahimz
  • 175
  • 1
  • 8
  • 1
    What does the function itself do? You ought to first see if you can make it `f(w, h)` (outputting the h x w matrix directly) instead. – nkjt Dec 19 '14 at 12:32
  • @nkjt The function is f([x,y],[i,j])= exp(-norm([x,y]-[i,j])^2/sigma^2), sigma being a predefined parameter. – Serpahimz Dec 19 '14 at 12:53
  • 1
    if you write a vectorized function for `f` then you can remove fro loops. It is better to add your function to the question. – NKN Dec 19 '14 at 13:26

3 Answers3

1

The one way to do this is to remove the 2 innermost loops and replace then with a vectorised version. By the look of your f function this shouldn't be too bad

First we need to construct two matrices containing the 1 to w on every row and 1 to h on every column like so

wMat=repmat(1:w,h,1);
hMat=repmat(1:h,w,1)';

This is going to represent the inner two loops, and the transpose will allow us to get all combinations. Now we can vectorise the calculation (f([x,y],[i,j])= exp(-norm([x,y]-[i,j])^2/sigma^2)):

for x=1:w;
    for y=1:h;
        temp1=sqrt((x-wMat).^2+(y-hMat).^2);
        X{x,y}=exp(temp1/(sigma^2));
    end
end

Where we have computed the Euclidean norm for all pairs of nodes in the inner loops at once.

user2539336
  • 180
  • 6
1

Some discussion and code

The trick here is to perform the norm-calculations with numeric arrays and save the results into a cell array version as late as possible. For performing the norm-calculations you can take help of ndgrid, bsxfun and some permute + reshape to give it the "shape" as needed for the final cell array version. So, here's the vectorized approach to perform these tasks -

%// Create x-y/i-j values to be used for calculation of function values
[xi,yi] = ndgrid(1:w,1:h);

%// Get the norm values
normvals = sqrt(bsxfun(@minus,xi(:),xi(:).').^2 + ...
                                bsxfun(@minus,yi(:),yi(:).').^2);
%// Get the actual function values
vals = exp(-normvals.^2/sigma^2); 

%// Get the values into blocks of a 4D array and then re-arrange to match
%// with the shape of numeric array version of X
blks = reshape(permute(reshape(vals, w*h, h, []), [2 1 3]), h, w, h, w);
arranged_blks = reshape(permute(blks,[2 3 1 4]),w,h,w,h);

%// Finally get the cell array version
X = squeeze(mat2cell(arranged_blks,w,h,ones(1,w),ones(1,h)));

Benchmarking and runtimes

After improving the original loopy code with pre-allocation for X and function-inling f, runtime-benchmarks were performed with it against the proposed vectorized approach with datasizes as w, h = 60 and the runtime results thus obtained were -

----------- With Improved loopy code
Elapsed time is 41.227797 seconds.
----------- With Vectorized code
Elapsed time is 2.116782 seconds.

This suggested a whooping close to 20x speedup with the proposed solution!


For extremely huge datasizes

If you are dealing with huge datasizes, essentially you are not giving enough memory for bsxfun to work with, and bsxfun is known to use up a lot of memory for giving you a performance-efficient vectorized solution. So, for such huge-datasize cases, you can use the following loopy approach to replace normvals calculations that was listed in the earlier bsxfun based solution -

%// Get the norm values
nx = numel(xi);
normvals = zeros(nx,nx);
for ii = 1:nx
    normvals(:,ii) = sqrt( (xi(:) - xi(ii)).^2 + (yi(:) - yi(ii)).^2 );
end
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • I must admit that I am not familiar with ndgrid,bsxfun,permute so I will have to read their documentation in order to properly understand the code but a 20x speedup is definitely impressive. Small question, what exactly do you refer to by "Improved loopy code", is it just the correction to the pre-allocation of X? – Serpahimz Dec 19 '14 at 19:16
  • @Serpahimz By `improved loopy code`, I meant two changes in your original code - 1) Using `X = cell(w,h);` at the start and 2) `X{x,y}(i,j) = exp(-norm([x,y]-[i,j])^2/sigma^2);` at the norm calculation part inside the loops. These are meant as good practices for performance, that's why calling it "improved". – Divakar Dec 19 '14 at 19:17
  • @Serpahimz Also, I would be dearly interested in what kinda speedups you would get with this proposed solution for your actual case! – Divakar Dec 19 '14 at 19:19
  • I will time it and report the results at some point tomorrow :) Thank you very much for your help! – Serpahimz Dec 19 '14 at 19:24
  • Hey Divkar, I just had the opportunity to test the code you suggested for its actual purpose and I'm afraid I ran into difficulties. The actual values of w,h are anywhere between 500 and 1500 and attempting to run the script you wrote with those values gives the following error: Error using bsxfun Out of memory. Type HELP MEMORY for your options. I have 16gig RAM and 30gig of swap memory available (on an SSD drive). If it's not possible to perform the processing in the RAM is it maybe somehow possible to do it on my SSD drive? – Serpahimz Dec 20 '14 at 17:55
  • @Serpahimz Check out the added For extremely huge datasizes section? The performance must still hold good enough I would think. – Divakar Dec 22 '14 at 09:19
0

It seems to me that when you run through the cycle for x=w, y=h, you are calculating all the values you need at once. So you don't need recalculate them. Once you have this:

for i=1:w
    for j=1:h
        temp(i,j)=f([x,y],[i,j])
    end 
end

Then, e.g. X{1,1} is just temp(1,1), X{2,2} is just temp(1:2,1:2), and so on. If you can vectorise the calculation of f (norm here is just the Euclidean norm of that vector?) then it will get even simpler.

nkjt
  • 7,825
  • 9
  • 22
  • 28
  • Notice that for any given x,y pair the internal loop on i,j creates a w x h matrix which depends on x,y. The combination of the four loops creates a total of h*w matrices each of size w x h which are all different from each other. You can think of x,y as the central point of the matrix stored in X{x,y} around which the values are calculated when I loop on i,j. Regardless, the reduction you suggest doesn't hold unfortunately. – Serpahimz Dec 19 '14 at 15:41
  • For better efficiency, you should swap the two loops, i.e. make the inner one the outer one and vice versa; [MATLAB is column-major](http://stackoverflow.com/questions/27491555/why-if-matlab-is-column-major-do-some-functions-output-row-vectors). – jub0bs Dec 19 '14 at 17:44