Julia anonymous functions and performance

Question

I am porting this Python code...

with open(filename, 'r') as f:
    results = [np.array(line.strip().split(' ')[:-1], float)
               for line in filter(lambda l: l[0] != '#', f.readlines())]

...to Julia. I came up with:

results = [map(ss -> parse(Float64, ss), split(s, " ")[1:end-1])
        for s in filter(s -> s[1] !== '#', readlines(filename))];

The main reason for this porting is a potential performance gain, so I timed the two snippets in a Jupyter notebook:

using %%timeit...
- Python: 12.8 ms ± 44.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
- Julia: @benchmark returns (among other things) mean time: 8.250 ms (2.62% GC). So far so good; I do get a performance boost.
However, when using @time:
- I get something in the lines of 0.103095 seconds (130.44 k allocations: 11.771 MiB, 91.58% compilation time). From this thread I inferred that it was probably caused by my -> function declarations.

Indeed, if I replace my code with:

filt = s -> s[1] !== '#';
pars = ss -> parse(Float64, ss);
res = [map(pars, split(s, " ")[1:end-1])
        for s in filter(filt, readlines(filename))];

and time only the last line, I get a more encouraging 0.073007 seconds (60.58 k allocations: 7.988 MiB, 88.33% compilation time); hurray! However, it kind of defeats the purpose (at least as I understand it) of anonymous functions and might lead to a bunch of f1, f2, f3, ... Giving a name to my Python lambda function out of the list comprehension does not seem to affect the Python's runtime.

My question is: to get normal performances, should I systematically name my Julia functions? Note that this particular snippet is to be called in a loop over ~30k files. (Basically, what I am doing is reading files that are mixtures of space-separated floats and comment lines; each float line can have a different length, and I am not interested in the last element of the line. Any comments on my solutions are appreciated.)

(Side comment: wrapping s with strip completely messes up @benchmark, adding 10 ms to the mean, but does not seem to affect @time. Any reason why?)

Putting everything in a function as suggested by DNF fixes my "have to name my anonymous functions" problem. Using one of Vincent Yu's formulations:

function results(filename::String)::Vector{Vector{Float64}}
    [[parse(Float64, s) for s in @view split(line, ' ')[1:end-1]]
        for line in Iterators.filter(!startswith('#'), eachline(filename))]
end

@benchmark results(FN)
BenchmarkTools.Trial: 
  memory estimate:  3.74 MiB
  allocs estimate:  1465
  --------------
  minimum time:     7.108 ms (0.00% GC)
  median time:      7.458 ms (0.00% GC)
  mean time:        7.580 ms (1.58% GC)
  maximum time:     9.538 ms (14.84% GC)
  --------------
  samples:          659
  evals/sample:     1

@time called on this function returns equivalent results after the first compilation run. I am happy with that.

However, this is my persisting issue with strip:

function results_strip(filename::String)::Vector{Vector{Float64}}
    [[parse(Float64, s) for s in @view split(strip(line), ' ')[1:end-1]]
        for line in Iterators.filter(!startswith('#'), eachline(filename))]
end

@benchmark results_strip(FN)
BenchmarkTools.Trial: 
  memory estimate:  3.74 MiB
  allocs estimate:  1465
  --------------
  minimum time:     15.155 ms (0.00% GC)
  median time:      15.742 ms (0.00% GC)
  mean time:        15.885 ms (0.75% GC)
  maximum time:     19.089 ms (10.02% GC)
  --------------
  samples:          315
  evals/sample:     1

The median time doubles. If I look at strip only:

function only_strip(filename::String)
    [strip(line) for line in Iterators.filter(!startswith('#'), eachline(filename))]
end

@benchmark only_strip(FN)
BenchmarkTools.Trial: 
  memory estimate:  1.11 MiB
  allocs estimate:  475
  --------------
  minimum time:     223.868 μs (0.00% GC)
  median time:      258.227 μs (0.00% GC)
  mean time:        325.389 μs (9.41% GC)
  maximum time:     56.024 ms (75.09% GC)
  --------------
  samples:          10000
  evals/sample:     1

Figures just do not add up. Could there be a type mismatch, should I cast the results to something else?

Anonymous functions are just as fast as named functions, whether stored in a variable or not. I'm guessing it's something to do with how you're timing it. Perhaps you're timing compilation time, or using globals, or one of the other performance gotchas. Best to post the complete code you are using to time your routine. As an example of the difference it can make, check the answer [here](https://stackoverflow.com/questions/51102221/difference-between-benchmark-and-time-macro-in-julia?noredirect=1&lq=1) — Colin T Bowers, Jun 24 '21 at 03:53
Remember to read the performance tips: https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-tips The issues you're having are probably due to benchmarking in global scope. Remember to put your code in a function. — DNF, Jun 24 '21 at 05:33
@DNF Wrapping everything in a function definitely helps, thank you. — Aubergine, Jun 25 '21 at 03:51

Bogumił Kamiński · Answer 1 · 2021-06-24T06:30:45.443

In order to (hopefully) clearly summarize what Colin T Bowers and DNF commented:

In Julia anonymous functions are as fast as named functions after compilation.
The difference you are observing is caused by compilation time.
When you use BenchmarkTools.jl (@btime) the time is always measured after compilation. If you just use @time the computed time includes compilation. Actually you get this information in the output (where you got % of time spent in compilation).
Similarly if you put the whole expression in the function it would get compiled only once, while if you evaluate it in top-level scope it gets compiled every time you run it.

The conclusion is:

If your code were really computationally expensive compilation time would not matter (as it is one-time constant cost), so you do not have to worry about it if you have a large computation to perform.
However, if your computation is cheap the compilation time will be noticeable as then every time you introduce a new anonymous function in top-level scope it has to be compiled.

Let me give you a canonical example showing the problem that will hopefully help you understand the issue better:

julia> x = rand(10^6);

julia> @time count(v -> v < 0.5, x) # a lot of compilation as everything needs to be compiled
  0.033077 seconds (18.34 k allocations: 1.047 MiB, 110.16% compilation time)
499921

julia> @time count(v -> v < 0.5, x) # v -> v < 0.5 is a new function - it has to be compiled
  0.013155 seconds (5.85 k allocations: 322.655 KiB, 95.92% compilation time)
499921

julia> @time count(v -> v < 0.5, x) # v -> v < 0.5 is a new function - it has to be compiled
  0.017371 seconds (5.85 k allocations: 322.702 KiB, 95.37% compilation time)
499921

julia> f(x) = x < 0.5
f (generic function with 1 method)

julia> @time count(f, x) # f is a new function - it has to be compiled
  0.011609 seconds (5.82 k allocations: 321.351 KiB, 95.85% compilation time)
499921

julia> @time count(f, x) # f was already compiled - we are fast
  0.000596 seconds (2 allocations: 32 bytes)
499921

julia> @time count(f, x) # f was already compiled - we are fast
  0.000621 seconds (2 allocations: 32 bytes)
499921

julia> @time count(<(0.5), x) # <(0.5) is a new callable - it has to be compiled
  0.013751 seconds (7.71 k allocations: 456.232 KiB, 96.03% compilation time)
499921

julia> @time count(<(0.5), x) # <(0.5) is callable already compiled - we are fast
  0.000504 seconds (2 allocations: 32 bytes)
499921

julia> @time count(<(0.5), x) # <(0.5) is callable already compiled - we are fast
  0.000616 seconds (2 allocations: 32 bytes)

The point is that v -> v > 0.5 is a new function every time you write it, even if you used exactly the same definition - Julia has to create a new anonymous function if you introduce it in global scope. It can be easily seen here:

julia> v -> v > 0.5
#7 (generic function with 1 method)

julia> v -> v > 0.5
#9 (generic function with 1 method)

(note that the number increases - it is a different function)

Now have a look at >(0.5):

julia> >(0.5)
(::Base.Fix2{typeof(>), Float64}) (generic function with 1 method)

julia> >(0.5)
(::Base.Fix2{typeof(>), Float64}) (generic function with 1 method)

It is the same callable every time - so it has to be compiled only once.

Finally if you wrap things in a function, as DNF explained you have:

julia> test() = v -> v > 0.5
test (generic function with 1 method)

julia> test()
#11 (generic function with 1 method)

julia> test()
#11 (generic function with 1 method)

And as you can see as anonymous function is defined within a named function the compiler knows it is the same anonymous function every time, so the number does not increase (it had to be compiled only once - the first time test got called).

Regarding the strip issue. The difference is visible in @btime but not with @time as the cost of strip in @time is dwarfed by the cost of compilation so you are simply unable to see the difference then, but it is actually the same in both cases.

I understood the "new function every time" from the post I linked to; but it's even clearer this way. I expanded a bit on what troubles me with split! — Aubergine, Jun 25 '21 at 04:12
When I compare `x = join(1:100, " ")` and then `@btime [parse(Float64, s) for s in @view split($x, ' ')[1:end-1]]` vs `@btime [parse(Float64, s) for s in @view split(strip($x), ' ')[1:end-1]]` I cannot reproduce the difference you observe. The version with `strip` is slower of course (because we do more work), but about 25%. — Bogumił Kamiński, Jun 25 '21 at 06:15
Testing on x = join(1:100, " ") gives me a 30% time increase with strip vs no strip. Calling strip($x) alone clocks for less than .01% (not 1%) of the array comprehension. I wonder what is the purpose of the $ sign? I understand it is used for string expansion; if I use it in this context, without a call to a benchmark macro, it just crashes (unless wrapped in quotes). I use Julia 1.6.0 — Aubergine, Jun 27 '21 at 23:29
How `$` works with `@btime` is explained here: https://github.com/JuliaCI/BenchmarkTools.jl#quick-start. — Bogumił Kamiński, Jun 28 '21 at 07:40
@Aubergine After a bit of testing, it looks like the overhead from using `strip` is actually due to its use of `SubString`. To check this, replace `strip` by `SubString` and see if you see the same performance hit. Ideally, `SubString` should have little to no overhead, and I'm not sure why the compiler is unable to optimize it away. — Vincent Yu, Jun 30 '21 at 03:24
@Vincent Yu I do see the same hit. Do you think I should fill a bug report? Note that in the end, it is more of a non-issue as I made the files I am reading myself, and they should be satinized already. But yeah, weird behaviour! — Aubergine, Jul 05 '21 at 00:45
@Aubergine - it would be best if you submitted a MWE of your case so that it can be investigated. Thank you! — Bogumił Kamiński, Jul 05 '21 at 06:42

Vincent Yu · Answer 2 · 2021-06-30T03:08:01.500

Bogumił Kamiński's answer is excellent. I'm just writing this to comment on your solution.

Note that you can use the DelimitedFiles module from the standard library to read such files into matrices. Like this:

using DelimitedFiles
readdlm(filename, ' ', Float64; comments=true, comment_char='#')

But you may find this slower than your code because it reads data into a column-major matrix instead of a row-based vector of vectors. Which one is better depends on your needs. (And of course, there are many packages that can read delimited files into various structures.)

Regarding your solution, I suggest a few small changes that can improve performance and memory use:

Both readlines and filter allocate new vectors that you don't keep. To avoid these memory allocations, use the iterator interface provided by eachline and Iterators.filter.
Similarly, indexing [1:end-1] creates an unnecessary vector. Use view or the convenient @view macro to avoid an allocation.

In addition, I think it's clearer in this code to stick to either map or array comprehension instead of mixing the two.

The following code incorporates these changes (using map with do notation). In my test case, this improves speed by about 15% and memory use by about 30%:

results = map(Iterators.filter(!startswith('#'), eachline(filename))) do line
    map(@view split(line, ' ')[1:end-1]) do s
        parse(Float64, s)
    end
end

If you prefer array comprehension over map, the following is identical:

results = [
    [
        parse(Float64, s)
        for s in @view split(line, ' ')[1:end-1]
    ]
    for line in Iterators.filter(!startswith('#'), eachline(filename))
]

As you point out in the comments, we can use broadcasting to eliminate the explicit inner loop, leading to this cleaner code:

results = map(Iterators.filter(!startswith('#'), eachline(filename))) do line
    parse.(Float64, @view split(line, ' ')[1:end-1])
end

results = [
    parse.(Float64, @view split(line, ' ')[1:end-1])
    for line in Iterators.filter(!startswith('#'), eachline(filename))
]

Nice. So, in relation to my answer, I would add that `!startswith('#')` will be compiled only once :), as `startswith('#')` uses the pattern I have described in my comment and also Julia is able to recognize that when one adds `!` in front of it and a new anonymous function is created it is also created only once. — Bogumił Kamiński, Jun 24 '21 at 09:38
Thanks for the suggestions. That's a lot of material to go through! — Aubergine, Jun 25 '21 at 04:09
I realised I can avoid the nested array comprehension completely by calling parse.(Float64, @view split(line, ' ')[1:end-1]) directly. It does not make much performance difference, but it looks a bit cleaner. I also checked out readdlm; the code snippet you provided crashes in my case, as my rows are of irregular length. I have to pass Any as type, which returns a matrix with shorter rows padded with empty string, that I then need to remove by iterating on the rows, which is obviously slower than the other solution, as you expected. — Aubergine, Jun 27 '21 at 23:34
@Aubergine That's a good point that I missed; broadcasting is indeed cleaner and I've added it to this answer. And I agree that `readdlm` is not suitable if your rows have irregular lengths. — Vincent Yu, Jun 30 '21 at 03:10

Julia anonymous functions and performance

2 Answers2