4

I have been trying to optimise some code that takes input signal in the form of a 2D array and convolves it with a gaussian of a certain FWHM. The size of this 2D signal array is (671, 2001), and I am convolving along the second axis - the time axis. Where,

time = 0:0.5:1000
nt = length(time)  # nt = 2001

I am convolving with a gaussian of FWHM of 150, corrosponding to a standard deviation of about 63.7. I then pad the initial signal by 3 times the FWHM, setting the new values before the original start index as 0 and the new values after the original end index as the signal at the last time point.

Hence, nt = 3801 after padding.

My MATLAB code calculates this convolution in 117.5043 seconds. This is done through a nested for loop over each of the axis - should be slow in MATLAB. The main bit of theMATLAB code is as follows,

for it = 1:nt
    % scan convolution mask across time
    kernal  = preb*exp(-b*(tc-tc(it)).^2); % convolution mask - Gaussian
    N(it) = sum(kernal)*dt;                % normalise
  
    kernal = kernal/N(it); % ensure convolution integrates to 1

    % calculate convolution for each row
    for iq = 1:Nq
        WWc(iq,it) = dt*sum((WWc0(iq,1:nt).').*kernal(1:nt));
    end
end

I have rewritten this in Julia, except using circulant matrices to construct a gaussian kernal, this effectivley removes the foor loop over time, and the kernal to be applied over all time is now fully defined by one matrix. I then either loop over the rows and multiply this kernal by the signal of each row, or use the mapslices function. The main bit of code in Julia is as follows,

K = Circulant(G.Fx) # construct circulant matrix of gaussian function G.Fx
conv = zeros(nr, nt) # init convolved signal

#for i=1:nr # If using a loop instead of mapslice
#    conv[i, :] = transpose(sC[i, :]) * K
#end

conv = mapslices(x -> K*x, sC; dims=2)

In Julia this convolution takes 368 seconds (almost 3 times as slow as MATLAB), despite using circulant matrices to skip a foor loop and reducing it down to multiplying two arrays of size (1, 3801) and (3801, 3801) for each row.

output of @time:

368.047741 seconds (19.37 G allocations: 288.664 GiB, 14.30% gc time, 0.03% compilation time)

Both are run on the same computer, should Julia not outperform MATLAB here?

Mark
  • 7,785
  • 2
  • 14
  • 34
Kyle
  • 115
  • 7
  • 2
    Are you doing this in a function? Put your implementation into a function in Julia. – mbauman May 21 '21 at 13:46
  • @AnderBiguri I realised in Julia I am just using the standard libraries multiplication function, using mul!() from LinearAlgebra provides some imporovement. I guess I need to call some LAPACK/BLAS routine to get similar speeds to matlab. I am aware of the FFT method, although I stuggled to implement it. Taking the fft of the signal and kernal and then multiplying them together before taking the ifft kept giving me complex values for the convolved signal. – Kyle May 21 '21 at 14:53
  • 2
    Repeating my question: is your Julia implementation in a function? Could you edit your code into a fully working self-contained example? – mbauman May 21 '21 at 15:20
  • @Kyle That is because you had a bug of course. That is a bad reason to avoid using a better technique though. plus MATLAB has functions that do it for you, such as `conv` – Ander Biguri May 21 '21 at 15:22
  • @mbauman , this part of the code is within a function. To which I pass some signal, a time vector and a value of FWHM. Although I am calling two constructor types that are outside my function. Could that make it slow? I have updated the post with a more complete code, including the whole function definition. – Kyle May 22 '21 at 11:14
  • “Julia must be faster than MATLAB because the makers of Julia say so.” LOL. Their benchmark comparing to MATLAB is awfully deceptive, using terrible programming techniques in MATLAB to make it seem slower. And they justify those techniques saying “but this is how people would like to program”, rather than doing things the way people actually program in MATLAB. Don’t believe the hype. – Cris Luengo May 22 '21 at 14:24
  • Related: https://stackoverflow.com/q/51181392/7328782 ; https://github.com/JuliaLang/Microbenchmarks/issues/4 – Cris Luengo May 22 '21 at 14:49
  • @Cris Luengo, Thanks for the info. That is disappointing as speed is meant to be its main selling point and was the whole reason I decided to give Julia a go. To see that the loops are slower than MATLAB, which is by no means speedy at for loops is surprising. Rewritting the MATLAB code in terms of circulant matrices actually reduces the speed to 4s vs over 300 in Julia. Although perhaps I have written my Julia code in a grossly inefficient fashion. – Kyle May 23 '21 at 11:17
  • You might have written the Julia code inefficiently. I don’t expect a huge difference between the two. MATLAB also has a JIT, just like Julia. This means that loops in MATLAB are not slow. They used to be, 20 years ago, and people somehow still haven’t gotten the message. There are things in MATLAB that still have an overhead, like function calls, which likely is smaller in Julia. But MATLAB certainly has optimized the underlying computation. – Cris Luengo May 23 '21 at 13:15
  • 2
    “MATLAB must be faster than Julia because it's expensive.” LOL. Bad takes aside, though, I wouldn't expect a huge difference between the two languages in this particular case either — Matlab's for loop isn't doing much work itself. At a quick glance I can see your Julia code is type unstable (all those type annotations aren't necessary and aren't helping), but it's still not written in a reproducibly. What are the sizes of the inputs? If you want to change the algorithm, you might as well change to an FFT as suggested above. Otherwise, the Julia code should look fairly similar to MATLAB's. – mbauman May 24 '21 at 14:23
  • 5
    @mbauman, Yeah I discovered the main issue was the unstable types, in addition I was doing matmul with a matrix of type Circulant, which reduces multiplication speed vs type Matrix (I guess BLAS is not called by default in the case of Circulant type). I have re-written my Julia code, ensuring it is type stable and getting rid of the unnecessary structs. It is now faster than my MATLAB code. MATLAB runs in 4 seconds, the Julia version runs in 170 ms (tested using BenchmarkTools). In both codes I am uses the same inputs, the padded signal is size (670, 3801) and the kernel is (3801, 3801). – Kyle May 24 '21 at 14:40
  • 5
    Any chance you can post the updated code as an answer for posterity? – Oscar Smith May 24 '21 at 18:09
  • 2
    xref: https://discourse.julialang.org/t/how-to-make-my-julia-code-fast-like-matlab/61678 – carstenbauer May 25 '21 at 07:04
  • @OscarSmith I have now edited the post to include the new code :) – Kyle May 25 '21 at 15:54

1 Answers1

1

Working fast code, with a wrapper for convolving a random signal.

using LinearAlgebra, SpecialMatrices

function wrapper()
    signal = rand(670, 2001)
    tvec = 0:0.5:1000
    fwhm = 150.0
    Flag = 0
    conv = convolve(signal, tvec, fwhm, Flag)
 end


function Gaussian(x:: StepRangeLen{Float64}, x0:: Float64, fwhm:: T)
where {T<:Union{Float64, Int64}}
    sigma = fwhm/2sqrt(2log(2))
    height = 1/sigma*sqrt(2pi)
    Fx = height .* exp.( (-1 .* (x .- x0).^2)./ (2*sigma^2))
    Fx = Fx./sum(Fx)
end


function Lorentzian(x:: StepRangeLen{Float64}, x0:: Float64, fwhm:: T)
where {T<:Union{Float64, Int64}}
    gamma = fwhm/2
    Fx = 1 ./ ((pi * gamma) .* (1 .+ ((x .- x0)./ gamma).^2 ))
    Fx = Fx./sum(Fx)
end


function generate_kernel(tconv:: StepRangeLen{Float64}, fwhm::
Float64, centre:: Float64, ctype:: Int64)
    if ctype == 0
        K = Gaussian(tconv, centre, fwhm)
    elseif cytpe == 1
        K = Lorentzian(tconv, centre, fwhm)
    else
        error("Only Gaussian and Lorentzian functions currently available.")
    end
    K = Matrix(Circulant(K))
    return K 
end


function convolve(S:: Matrix{Float64}, time:: StepRangeLen{Float64},
fwhm:: Float64, ctype:: Int64)
    nr:: Int64, nc:: Int64 = size(S)
    dt:: Float64 = sum(diff(time, dims=1))/(length(time)-1)
    nt:: Int64 = length(time)
    duration:: Float64 = fwhm*3 # must pad three time FHWM to deal with edges
    tmin:: Float64 = minimum(time); tmax:: Float64 = maximum(time)
    if dt != diff(time, dims=1)[1] ; error("Non-linear time range."); end
    if nr != nt && nc != nt; error("Inconsistent dimensions."); end
    if nc != nt; S = copy(transpose(S)); nr, nc = nc, nr; end # column always time axis

    tconv_min:: Float64 = tmin - duration ; tconv_max:: Float64 = tmax + duration
    tconv = tconv_min:dt:tconv_max # extended convoluted time vector
    ntc = length(tconv)
    padend:: Int64 = (duration/dt) # index at which 0 padding ends and signal starts
    padstart:: Int64 = (padend + tmax / dt) # index where signal ends and padding starts

    K = generate_kernel(tconv, fwhm, tconv[padend], ctype)

    sC = zeros(Float64, nr, ntc)
    sC[1:nr, 1:padend] .= @view S[1:nr, 1] # extended and pad signal
    sC[1:nr, padend:padstart] .= S
    sC[1:nr, padstart:end] .= @view S[1:nr, end]

    conv = zeros(Float64, nr, ntc)
    conv = convolution_integral(sC, K, dt)

    S .= conv[:, 1:nt] # return convoluted signal w same original size

    return S
end


function convolution_integral(signal:: Matrix{Float64}, kernel::
Matrix{Float64}, dt:: Float64)
    LinearAlgebra.BLAS.set_num_threads(Sys.CPU_THREADS)
    conv = signal * kernel
    conv .*= dt
    return conv 
end

Credit to @Kyle

Mark
  • 7,785
  • 2
  • 14
  • 34
  • 1
    Just two tips: Firstly, all the type signatures are needlessly restrictive, and limit the usability of the code without any performance benefit. Secondly, the unusual placement of the `end` keyword, is quite distracting and non-idiomatic. It makes the code look like Python, which is not great. Otherwise – DNF Jun 16 '23 at 22:06
  • I just edited it to put the `end`s on new lines :-). I get what you mean about the type signatures- they are quite a lot- I didn't remove them right now because I assume if someone comes to this new, the types of the various parameters helps with parsing what the functions actually do. Happy either way though! – Mark Jun 16 '23 at 22:30
  • Some people do like type signatures, for sure. In this case, I would replace `StepRange` with `AbstracArray` and the `Float64`/`Int64` stuff with `Real` (or `<:Real`). No reason to disallow `Vector` or `Float32`. – DNF Jun 16 '23 at 22:36