0

I've used spmd to calculate two piece of code simultaneously. The computer which I'm using have a processor with 8 cores.which means the communication overhead is something like zero!
I compare the running time of this spmd block and same code outside of spmd with tic & toc.
When I run the code, The parallel version of my code take more time than the sequential form.
Any idea why is that so?
Here is a sample code of what I'm talking about :

tic;
spmd
    if labindex == 1
       gamma = (alpha*beta);
    end
    if labindex == 2
        for t = 1:T,
            for i1=1:n
                for j1=1:n
                    kesi(i1,j1,t) = (alpha(i1,t) + phi(j1,t));
                end;
            end;
        end;
    end
end
t_spmd = toc;


tic;
    gamma2= (alpha * beta);
for t = 1:T,
    for i1=1:n
        for j1=1:n
            kesi2(i1,j1,t) = (alpha(i1,t) + phi(j1,t));
        end;
    end;
end;
t_seq = toc;
disp('t spmd : ');disp(t_spmd);
disp('t seq : ');disp(t_seq);

1 Answers1

1

There are two reasons here. Firstly, your use of if labindex == 2 means that the main body of the spmd block is being executed by only a single worker - there's no parallelism here.

Secondly, it's important to remember that (by default) parallel pool workers run in single computational thread mode. Therefore, when using local workers, you can only expect speedup when the body of your parallel construct cannot be implicitly multi-threaded by MATLAB.

Finally, in this particular case, you're much better off using bsxfun (or implicit expansion in R2016b or later), like so:

T       = 10;
n       = 7;
alpha   = rand(n, T);
phi     = rand(n, T);
alpha_r = reshape(alpha, n, 1, T);
phi_r   = reshape(phi, 1, n, T);
% In R2016b or later:
kesi    = alpha_r + phi_r;
% In R2016a or earlier:
kesi    = bsxfun(@plus, alpha_r, phi_r);
Edric
  • 23,676
  • 2
  • 38
  • 40
  • Sir, do you have an official reference for this part of your answer? " (by default) parallel pool workers run in single computational thread mode. Therefore, when using local workers, you can only expect speedup when the body of your parallel construct cannot be implicitly multi-threaded by MATLAB." –  Jan 18 '17 at 04:59
  • This R2016b release note implies it http://www.mathworks.com/help/distcomp/release-notes.html#bvczl_g-1 but I can't find a more concrete doc link I'm afraid. It's definitely been the case since the first release of Parallel Computing Toolbox back in 2005 (or was it 2004). – Edric Jan 18 '17 at 09:01