How to increase Julia code performance by preventing memory allocation?

Question

I am reading Julia performance tips, https://docs.julialang.org/en/v1/manual/performance-tips/

At the beginning, it mentions two examples.

Example 1,

julia> x = rand(1000);

julia> function sum_global()
       s = 0.0
       for i in x
           s += i
       end
       return s
   end;

julia> @time sum_global()
0.009639 seconds (7.36 k allocations: 300.310 KiB, 98.32% compilation time)
496.84883432553846

julia> @time sum_global()
0.000140 seconds (3.49 k allocations: 70.313 KiB) 
496.84883432553846

We see a lot of memory allocations.

Now example 2,

julia> x = rand(1000);

julia> function sum_arg(x)
       s = 0.0
       for i in x
           s += i
       end
       return s
   end;

julia> @time sum_arg(x)
0.006202 seconds (4.18 k allocations: 217.860 KiB, 99.72% compilation time)
496.84883432553846

julia> @time sum_arg(x)
0.000005 seconds (1 allocation: 16 bytes)
496.84883432553846

We see that by putting x into into the argument of the function, memory allocations almost disappeared and the speed is much faster.

My question are, can anyone explain,

why example 1 needs so many allocation, and why example 2 does not need as many allocations as example 1? I am a little confused.
in the two examples, we see that the second time we run Julia, it is always faster than the first time. Does that mean we need to run Julia twice? If Julia is only fast at the second run, then what is point? Why not Julia just do a compiling first, then do a run, just like Fortran?
Is there any general rule to preventing memory allocations? Or do we just always have to do a @time to identify the issue?

Thanks!

Actually, example 2 needs zero allocations. The 16B allocation you see is just because of the `@time` macro itself. For more accurate benchmarking, you can use BenchmarkTools.jl — DNF, Jul 29 '21 at 11:01

score 7 · Accepted Answer · answered Jul 29 '21 at 09:37

why example 1 needs so many allocation, and why example 2 does not need as many allocations as example 1?

Example 1 needs so many allocations, because x is a global variable (defined out of scope of the function sum_arg). Therefore the type of variable x can potentially change at any time, i.e. it is possible that:

you define x and sum_arg
you compile sum_arg
you redefine x (change its type) and run sum_arg

In particular, as Julia supports multiple threading, both actions in step 3 in general could happen even in parallel (i.e. you could have changed the type of x in one thread while sum_arg would be running in another thread).

So because after compilation of sum_arg the type of x can change Julia, when compiling sum_arg has to ensure that the compiled code does not rely on the type of x that was present when the compilation took place. Instead Julia, in such cases, allows the type of x to be changed dynamically. However, this dynamic nature of allowed x means that it has to be checked in run-time (not compile time). And this dynamic checking of x causes performance degradation and memory allocations.

You could have fixed this by declaring x to be a const (as const ensures that the type of x may not change):

julia> const x = rand(1000);

julia> function sum_global()
           s = 0.0
           for i in x
               s += i
           end
           return s
       end;

julia> @time sum_global() # this is now fast
  0.000002 seconds
498.9290555615045

Why not Julia just do a compiling first, then do a run, just like Fortran?

This is exactly what Julia does. However, the benefit of Julia is that it does compilation automatically when needed. This allows you for a smooth interactive development process.

If you wanted you could compile the function before it is run with the precompile function, and then run it separately. However, normally people just run the function without doing it explicitly.

The consequence is that if you use @time:

The first time you run a function it returns you both execution time and compilation time (and as you can see in examples you have pasted - you get information what percentage of time was spent on compilation).
In the consecutive runs the function is already compiled so only execution time is returned.

Is there any general rule to preventing memory allocations?

These rules are exactly given in the Performance Tips section of the manual that you are quoting in your question. The tip on using @time is a diagnostic tip there. All other tips are rules that are recommended to get a fast code. However, I understand that the list is long so a shorter list that is good enough to start with in my experience is:

Thank you so much! You answer is excellent! I just have one more question. Yes, the first run of Julia may be slow, the second run can be much faster bc things has already been compiled. If in windows, in the cmd window, I run Julia two times, the first time is slow, the second time is fast. Then I close the cmd windows. Then I open a new cmd window, and I run the jl file again. This time, is the code still fast? I mean is this a third run which should be fast, or is it just a new first run which can be slow? — CRquantum, Jul 29 '21 at 09:56
In addition, I mean, for example, I have a code needs to do iterations, and there is a big do loop. Say looping over 1000 times. Is it only the first loop is slow because Julia needs to compile all the functions, or is it that the whole 1000 loops are all slow? — CRquantum, Jul 29 '21 at 10:00
The JIT compiled code is not stored between sessions - if you restart Julia, every method you define will have to be recompiled upon first call again. A loop like `for i in 1:1000; my_function(i); end` will result in `my_function(i)` being compiled in the first iteration, with the remaining 999 iterations not incurring compilation cost. This is rather the point: for expensive, long-running calculations, a 1s compilation overhead on the first call is easily amortized by efficient code generation over the entire runtime. — Nils Gudat, Jul 29 '21 at 10:08
For complicated and time consuming code, the time difference between the first run and the second run should be more and more negligible, right? Like, if a code, Fortran compling time is 10 seconds, and run time is an hour. So totally I can say Fortran take 1 hour to run bc the 10 seconds compiling time is negligible. But if we are to compare the timing with Julia, we should only compare it with Julia's first run, right? On the other hand, if the 1st Julia run took 1 hour, then I guess the 2ndrun took roughly the same time. It is unlikely 1st time 1 hour, second time only took 1 second. — CRquantum, Jul 29 '21 at 10:09
1. Yes - code compilation takes "seconds" not "hours", so if your code runs for 1 hour it is mostly execution time. 2. Nils is right in what happens in interactive work (and this is 99.9% of usage of Julia - people choose it because it is interactive - which is its strong point). However, you CAN compile Julia in a similar way you are thinking of compiling Fortran. It can be done using https://github.com/JuliaLang/PackageCompiler.jl. However, it is a more advanced technique so I did not want to complicate the answer with this aspect (compiled code does not even require Julia to be installed). — Bogumił Kamiński, Jul 29 '21 at 10:16

How to increase Julia code performance by preventing memory allocation?

1 Answers1