Julia symbolic and numeric performance vs Python

Question

I just translated a set of scientific calculations involving matrices which elements are symbolic expressions which are differentiated and combined with various other mathematical expressions then numerically integrated. The pieces of code below constitute a minimal example for the sake of reproducing the performance gap I am experiencing. I understand that differentiating symbolically then integrating numerically does not make sense, but again, the point is about performance gap. It's important to note that importing libraries do not represent much time and do not explain the performance gap.

Julia code:

using Symbolics, QuadGK

@variables x

m = [i * 10*x^3 + 1/i * sin(x) + 5*i*x^3 * cos(x) - 8i*x^2 + 2/sin(i*3.0)*x + exp(1/(x+10)) for i in 1:500]

m_d = expand_derivatives.(Differential(x).(m))
m_d_expr = build_function(m, x)
m_d_f = eval(m_d_expr[1])
v = quadgk(m_d_f, 0, 1)
print(v[1])

Python Code:

import pandas as pd
import numpy as np
from sympy import sin, diff, pi, lambdify, integrate, cos, exp
from sympy.abc import x
from sympy.matrices import Matrix
from scipy.integrate import quad

def integrate_matrix(m, v, a, b):
    mi = np.zeros((m.rows, m.cols))
    for i in range(m.rows):
        for j in range(m.cols):
            f = lambdify(v, m[i, j])
            integral_value = quad(f, a, b)[0]
            mi[i, j] = integral_value
            
    return mi


m = Matrix([i * 10*x**3 + 1/i * sin(x) + 5*i*x**3 * cos(x) - 8*i*x**2 + 2/sin(i*3.0)*x + exp(1/(x+10)) for i in range(1, 501)])

v = integrate_matrix(m, x, 0, 1)
print(v)

My question: Is there a way to improve the Julia code to match Python code performance. Each time I try to impress my piers about Julia performance, I get embarrassed. I am still a Julia noob, but I really do not see what to do.

Approx timing: Python : 6 seconds Julia : 30+ seconds

Julia version 1.6 Python 3.7

Note: I am posting this due to the big gap. And no, the CAS does not explain it all. Moreover, we are doing a symbolic diff not integration, not to mention sympy is known to be slow. I could add code to precisely time, then what? The original scientific code I faced the problem with was 6 seconds Python vs 75 seconds Julia. What a shame.

So, this will probably come down to the CAS library you are using. I doubt what you are actually doing in Python is making a huge difference — juanpa.arrivillaga, Apr 13 '21 at 19:25
It would be best to add information about the timings and other performance measures you use to say there is a performance gap. As well as info about how did you benchmarked and what versions of the languages and packages are you using. — aramirezreyes, Apr 13 '21 at 20:11
"Each time I try to impress my piers about Julia performance, I get embarrassed" - kinda same here... You could try profiling your Julia code to see what part of it takes the most time. See: https://opensourc.es/blog/constraint-solver-profiling/ and https://stackoverflow.com/questions/65625496/line-by-line-profiling-of-julia-code-possible. — ForceBru, Apr 13 '21 at 20:26
I will profile and get back to you. From what I saw, it's slower at every step. The code is small enough for anyone to run and check. — Tarik, Apr 13 '21 at 20:27
How are you measuring the time? I just benchmarked `quadgk(m_d_f, 0, 1)`, which takes 264ms (using `@benchmark)` in my computer, vs. `integrate_matrix(m, x, 0, 1)` 1.36 s ± 15.6 ms (using `%timeit` in iPython). It seems that the time you're observing considers compilation time. — , Apr 13 '21 at 20:28
I am not sure I understand your comment. I am seeing that the Julia code runs *faster* after disregarding the compilation time (so, you're paying upfront but getting code that runs faster). — , Apr 13 '21 at 20:33
It's not about cherry-picking, more about your use case and understanding the observed differences. I am not defending either, and it totally depends on your use case: If you need to compile once and run many times, Julia will be faster in the end. If you need to run once, then Python might be a better option. That being said, you may want to explore [PackageCompiler.jl](https://github.com/JuliaLang/PackageCompiler.jl) or ask in Julia's discourse if you're still not satisfied. — , Apr 13 '21 at 20:42

ForceBru · Accepted Answer · 2021-04-13T21:37:40.430

Running the entire thing faster is what any sane person cares about.

As far as I understand, Julia cares about running stuff multiple times faster, while running it exactly once is always slower because Julia code needs to be compiled before being executed. Unlike Julia, Python doesn't do any JIT compilation and is always ready to run at the same speed.

Julia 1.6

So, I pasted your Julia code into code.jl and ran it multiple times within the same session:

# New Julia session!
julia> @time include("code.jl")
[long array...]
 24.660636 seconds (42.99 M allocations: 2.607 GiB, 3.82% gc time, 0.02% compilation time)

julia> @time include("code.jl")
[long array...]
  2.761062 seconds (5.61 M allocations: 240.159 MiB, 10.39% gc time, 57.06% compilation time)

julia> @time include("code.jl")
[long array...]
  2.608917 seconds (5.61 M allocations: 240.164 MiB, 4.47% gc time, 61.75% compilation time)

# Restarted Julia
julia> @time include("code.jl")
 25.538249 seconds (42.99 M allocations: 2.607 GiB, 3.76% gc time, 0.02% compilation time)

julia> @time include("code.jl")
  2.740550 seconds (5.61 M allocations: 240.159 MiB, 9.94% gc time, 56.72% compilation time)

So, it takes about 25 seconds to run your code the first time and around 3 seconds (!) to run it again, even though 50% of these 3 seconds is spent compiling stuff. However, only 0.02% of the initial 25 seconds is spent compiling. Apparently, the slowdown isn't due to compilation time? Also notice how many memory allocations it performs on the first run: 43 million vs around 5.5 million (7 times less!) for the next runs. But anyway, the first run is really slow while subsequent runs are lightning fast.

Loading packages the first time is slow too:

julia> @time using Symbolics
  3.503349 seconds (6.42 M allocations: 460.519 MiB, 3.53% gc time, 0.13% compilation time)

julia> @time using Symbolics
  0.000241 seconds (136 allocations: 9.641 KiB)
  0.000280 seconds (136 allocations: 9.641 KiB)
  0.000249 seconds (136 allocations: 9.641 KiB)
  0.000251 seconds (136 allocations: 9.641 KiB)
  0.000252 seconds (136 allocations: 9.641 KiB)
  0.000246 seconds (136 allocations: 9.641 KiB)

# I didn't import it before,
# but apparently `Symbolics` did
julia> @time using QuadGK
  0.000276 seconds (137 allocations: 9.688 KiB)
  0.000276 seconds (136 allocations: 9.641 KiB)
  0.000240 seconds (136 allocations: 9.641 KiB)
  0.000251 seconds (136 allocations: 9.641 KiB)

That is, 3.5 seconds are spent just running the first line of your code with the imports. Subsequent imports are obviously faster because of caching, I presume.

The first run of the list comprehension is slow as well

julia> @time m = [i * 10*x^3 + 1/i * sin(x) + 5*i*x^3 * cos(x) - 8i*x^2 + 2/sin(i*3.0)*x + exp(1/(x+10)) for i in 1:500];
  2.590259 seconds (4.69 M allocations: 284.672 MiB, 10.86% gc time, 98.69% compilation time)

julia> @time m = [i * 10*x^3 + 1/i * sin(x) + 5*i*x^3 * cos(x) - 8i*x^2 + 2/sin(i*3.0)*x + exp(1/(x+10)) for i in 1:500];
  0.102573 seconds (231.21 k allocations: 12.507 MiB, 72.61% compilation time)
  0.098871 seconds (231.21 k allocations: 12.508 MiB, 72.39% compilation time)
  0.108458 seconds (231.21 k allocations: 12.512 MiB, 7.93% gc time, 67.73% compilation time)
  0.099787 seconds (231.22 k allocations: 12.508 MiB, 72.99% compilation time)
  0.098378 seconds (231.21 k allocations: 12.507 MiB, 73.80% compilation time)

Again, slow startup (98.69% compilation time), but the next runs are way faster.

Python 3.9.2

~/t/SO_q $ time python3 thecode.py
________________________________________________________
Executed in    5,88 secs
~/t/SO_q $ time python3 thecode.py
________________________________________________________
Executed in    5,90 secs
Executed in    5,36 secs
Executed in    5,39 secs
Executed in    5,35 secs
Executed in    5,36 secs
Executed in    5,77 secs
Executed in    6,10 secs
Executed in    5,38 secs

Thus, Python code consistently runs for about 6 seconds.

Which is 2 times slower than subsequent runs of Julia code! However, you get this kind of speed as soon as you fire up the Python interpreter, while Julia will spend time compiling code and doing... other stuff that requires 43 million memory allocations. But what Julia gives you in exchange for terrible startup times is the performance of compiled code (Julia was 2 times faster than Python in this example).

How to make Julia faster

Build a custom sysimage. This looks like overkill to me, unless you really need to restart Julia every time to run your code.
Simply run your code from the same REPL. The simplest variant of this is to include("your_code.jl") after modifying the code. This may lead to weird errors because the environment will be populated by data from previous runs.
Run code in Pluto, which is a notebook that also keeps a live Julia session, but is smart about managing the environment

Printing the huge arrays to screen makes your time go up, so `@time include("code.jl")` is computing compilation plus run and print time. — , Apr 13 '21 at 21:53
@ForceBru Thanks for your efforts in providing a thorough answer. I want to emphasize that I love Julia and believe this language is the future. That said, building a sysimage is beyond what the busy average Joe is willing to do or simply has time to do. Using the REPL is the first thing I tried with the code that I cannot post as it does not belong to me. And guess what? Still slow as hell. So Pluto will probably not do. — Tarik, Apr 14 '21 at 01:02
To conclude: as it is the case when it comes to the first plot, this case, and I assume many other such cases, these issues need to be addressed instead of hiding behind compile time, no proper to the nanosecond benchmarking or options that are not viable to the user. Otherwise, Julia's adoption will take way longer than it should. Smart caching of compiled code saved on disk could be an option. Python compiles scripts to pyc files containing Python bytecode. Why not following a similar strategy so as to provide a system that just works. — Tarik, Apr 14 '21 at 01:09
Slow startup times are acknowledgd andbeing actively worked in the Julia community. In the other hand I think you may have misunderstood the intention behind the comments. You asked for an explanation about the difference in performance. When I was asking for details about how you ran your measurements was not to "hide behind" compilation times or benchmarking details. It was to be able to form an informed opinion on what was going on. @ForceBru took it and created a way to reproduce, but as you say, that is more than what the average Joe should expect to do when trying to provide help online. — aramirezreyes, Apr 14 '21 at 04:22
@Tarik Building sysimage is very easy in julia, just use `PackageCompiler.jl` package. This will do the whole process in two to three lines of code. You can even include the precompilation of your julia file to further speed up. If you want i can post the method to build one ! — Mohammad Saad, Apr 14 '21 at 09:58
@MohammadSaad As mentioned, many in the scientific community do not have time for any of this. They just want to get things done. If Julia is to become the better alternative when compared to Python or MATLAB for that matter, it should just work out of the box without any manual manipulation. And thanks for your offer to post a how to, I know how to Google things out. Using the sysimage thing, do you think Julia will beat the Python 6 seconds in a cold startup? I personally doubt it. That's all what the OP was about. — Tarik, Apr 16 '21 at 00:53
@MohammadSaad OK, built a custom sysimage and got 3 seconds total runtime for Julia vs Python's 6 seconds. Still far from the advertised performance. — Tarik, Apr 16 '21 at 13:25
@Tarik, what's the "advertised performance" then? Is a 2x speedup not enough? — ForceBru, Apr 16 '21 at 19:57
@ForceBru This is a typical Julia ad: https://julialang.org/benchmarks/. Julia is supposedly 1 to 2 orders of magnitude faster than Python. — Tarik, Apr 16 '21 at 20:31
@Tarik, it's not like _all_ Julia code is two orders of magnitude faster than any Python code. SciPy may very well be calling some super-optimized Fortran under the hood, for instance. Julia's CAS are very young, so Symbolics or QuadGK could be poorly optimized in terms of algorithms (I'm just guessing here), Julia itself is very young, so there's a lot of room for improvement - there could be a lot of factors at play that make this Julia code "only" 2 times faster than Python code. Again, you could profile your Julia code to see what takes the most time — ForceBru, Apr 16 '21 at 20:46
@ForceBru Let me admit that I should not have been venting frustration here. Thanks all for the time spent responding to this question. I have been doing quite a bit of reading and playing with PackageCompiler. It clarified lots of things. That said, I struggled beyond building the basic sysimage. I found out the docs are somewhat outdated. Example: relegated under upgrade notes: "The julia_main function for executables should no longer take any arguments (just access the global ARGS) and no longer need to be annotated with Base.@ccallable". Failing to go to this section can waste hours. — Tarik, Apr 17 '21 at 02:06