7

I am a new julia user and I am trying to get a feeling on what is the best pratice to make fast code in julia. I am mainly making element wise operations in arrays/matrices. I tried a few codes to check which one allow me to get the higher speed

fbroadcast(a,b) = a .*= b;

function fcicle(a,b)
   @inbounds @simd for i in eachindex(a)
      a[i] *= b[i];
   end
end
a = rand(100,100);
b = rand(100,100);
@btime fbroadcast(a,b)
@btime fcicle(a,b)

Using the function with the for achieved a speed about 2x of the broadcast version. What are the differences between both cases? I would expect the broadcasting to internally loop the operations very similarly to what I did on the fcicle. Finally, is there any way of achieve the best speed with a short syntax like a .*= b?

Many thanks, Dylan

Dylan
  • 107
  • 6
  • 1
    Note that `fbroadcast` does inherently have some overhead (ie it has to check that the indexes are the same on `a` and `b`). The factor of 2 for largish arrays though is strange. – Oscar Smith Feb 25 '20 at 22:05
  • 100 x 100 isn't really largish. For 1000x1000 the difference is already only about 20% on my machine. – carstenbauer Feb 25 '20 at 22:15
  • For 10_000x10_000 it's about 8%. – carstenbauer Feb 25 '20 at 22:26
  • Checking the size of arrays is _really_ fast (and O(1)). And you should anyway do that in the loop version too since you use `@inbounds`! You should also try interpolating `a` and `b` into the benchmarking expressions. – DNF Feb 25 '20 at 22:39
  • For 1000x1000 I still have a difference of about 2x. After some more testing I think that the main difference is the way that the array is accessed. On the fcicle the array is read as a vector while in the broadcasting I assumed it is read as a matrix. If I use a = rand(1000 * 1000) and b = rand(1000 * 1000) the time is about the same – Dylan Feb 25 '20 at 22:56
  • 1
    FWIW The slow-down of the broadcasted version seems to only happen in the case of multi-dimensional arrays. For simple vectors, both versions yield identical performances. – François Févotte Feb 26 '20 at 02:44

1 Answers1

4

I wouldn't have expected this. Could it be a performance bug?

In the meantime, broadcast performance issues exhibited in this case only seem to appear for 2D arrays. The following looks like an ugly hack, but it seems to restore performance:

function fbroadcast(a,b)
    a, b = reshape.((a, b), :) # convert a and b to 1D vectors
    a .*= b
end

function fcicle(a,b)
   @inbounds @simd for i in eachindex(a)
      a[i] *= b[i];
   end
end
julia> using BenchmarkTools
julia> a = rand(100, 100);
julia> b = rand(100, 100);

julia> @btime fbroadcast($a, $b);
  121.301 μs (4 allocations: 160 bytes)

julia> @btime fcicle($a, $b);
  122.012 μs (0 allocations: 0 bytes)
François Févotte
  • 19,520
  • 4
  • 51
  • 74