I'm doing some calculation in Julia and noticed it's running significantly slower (about 25 times!) than numpy counterpart.
Then I realized Julia is only using 8 threads out of total 96 CPU threads (48 physical cores) on my PC while numpy seems to be having no problem utilzing well over 70 threads.
Running Julia with $julia --thread 96
argument does not make any difference even though julia> Threads.nthreads()
returns 96
.
Furthermore, a little disapointment from the result is that I suspect Julia using all of the 96 threads still might not be able to matchup with numpy's speed.
Here is the Julia code. I simply measure the time with julia> @time calc_fid(mat_a, mat_b)
which gave me 90 seconds average.
using Statistics
using LinearAlgebra
function calc(A::Array{Float32,2}, B::Array{Float32,2})
μ_A = mean(A, dims=2)
μ_B = mean(B, dims=2)
σ_A = cov(A, dims=2)
σ_B = cov(B, dims=2)
ssdiff = sum((μ_A - μ_B).^2)
covmean = sqrt(σ_A * σ_B)
res = ssdiff + tr(σ_A .+ σ_B .- 2.0 * covmean)
return res
end
Here is the numpy code that takes about 3.5 seconds average.
Measured with time.perf_counter()
import numpy as np
from numpy import cov
from numpy import trace
from scipy.linalg import sqrtm
def calc(A, B):
mu_A = A.mean(axis=0)
mu_B = B.mean(axis=0)
sigma_A = cov(A, rowvar=False)
sigma_B = cov(B, rowvar=False)
ssdiff = np.sum((mu_A - mu_B) ** 2.0)
covmean = sqrtm(sigma_A.dot(sigma_B))
res = ssdiff + trace(sigma_A + sigma_B - 2.0 * covmean)
return res
Any suggestion/explanation would be greatly appreciated!