I'm solving a PDE using an implicit scheme, which I can divide into two matrices at every time step, that are then connected by a boundary condition (also at every time step). I'm trying to speed up the process by using multi-processing to invert both matrices at the same time.
Here's an example of what this looks like in a minimal (non-PDE-solving) example.
using Distributed
using LinearAlgebra
function backslash(N, T, b, exec)
A = zeros(N,N)
α = 0.1
for i in 1:N, j in 1:N
abs(i-j)<=1 && (A[i,j]+=-α)
i==j && (A[i,j]+=3*α+1)
end
A = Tridiagonal(A)
a = zeros(N, 4, T)
if exec == "parallel"
for i = 1:T
@distributed for j = 1:2
a[:, j, i] = A\b[:, i]
end
end
elseif exec == "single"
for i = 1:T
for j = 1:2
a[:, j, i] = A\b[:, i]
end
end
end
return a
end
b = rand(1000, 1000)
a_single = @time backslash(1000, 1000, b, "single");
a_parallel = @time backslash(1000, 1000, b, "parallel");
a_single == a_parallel
Here comes the problem: the last line evaluate to true, with an 6-fold speed-up, however, only 2-fold should be possible. What am I getting wrong?