No speedup with parfor

Question

I wrote a finite element toolbox in Matlab which turned to be rather slow for large meshes, so I decided to parallelize the element matrices assembly.

Hence, after initiating the workers pools I use cell arrays in order to build the global matrices, following the helpful advice in https://es.mathworks.com/matlabcentral/answers/203734-most-efficient-way-to-add-multiple-sparse-matrices-in-a-loop-in-matlab

% Ne is the number of elements in the mesh
% Mij: cell array to store mass matrix components Mij{k}
% Kij: cell array to store stiffness matrix components Kij{k}
% Fi: cell array to store RHS vector components Fi{k}
Mij = cell(Ne, 1);
Kij = cell(Ne, 1);
Fi = cell(Ne, 1);

% stcData is a large structure with the mesh data
% Temp is the temperature field (vector) at time t
parfor k = 1:Ne
        % Mij{k} = [I, J, M] size DOF^2 x 3
        % Kij{k} = [I, J, K] size DOF^2 x 3
        % Fi{k} = [I, F] size DOF x 2
        [Mij{k}, Kij{k}, Fi{k}] = ...
            solv_AssemblElementMatrix(k, time, Temp, stcData);
end

Mmat = cell2mat(Mij);
Kmat = cell2mat(Kij);
Fmat = cell2mat(Fi);

% Global matrices assembly
M = sparse(Mmat(:, 1), Mmat(:, 2), Mmat(:, 3), Nx, Nx);
K = sparse(Kmat(:, 1), Kmat(:, 2), Kmat(:, 4), Nx, Nx);
F = sparse(Fmat(:, 1), ones(size(Fmat, 1), 1), Fmat(:, 2), Nx, 1);

I have tried the previous serial code and this parallelized version with 2 workers with hardly no speedup (around 1.1).

I hope you can help me out to locate the problem.

What is your CPU use? If you have hyperthreading, utilization around 50% is the normal maximum (100% without HT). If you are already near 50% utilization, this `solv_...` could be parallel or well vectorized already. Say stuff like `fft`, `eig` already uses the maximum your CPU has to offer - no need to parallelize in these cases, you will only burn more RAM for no speedup. EDIT: Additionally, if the `solv_...` isn't the main bottleneck here, parfor obviously won't help. Profile the code. — Zizy Archer, Sep 21 '18 at 10:54
You are right, CPU use is around 50% with HT. I will use the profile tool to check the amount of time spent in this `solv_...` function. From your comment I understand that if 30% of serial time is spent in `solv_...`, then the maximum saving I can get by parallelizing this is 30(1-1/p), where p is the nomber of workers, is that right? I was thinking about some overhead problem due to data transfer among workers though. — jlorenzo, Sep 21 '18 at 12:17
I think the problem is that the global stiffness matrix is a shared, mutable resource that has to be locked for each update to be assembled correctly. I can see that it's easy to parallelize element matrix generation, since those are independent, but not the assembly and solver. — duffymo, Sep 21 '18 at 12:32
Possible duplicate of [MATLAB parfor is slower than for -- what is wrong?](https://stackoverflow.com/questions/3174358/matlab-parfor-is-slower-than-for-what-is-wrong) — Wolfie, Sep 21 '18 at 13:08

No speedup with parfor

0 Answers0