I tried to do some experiment about matrix inverse to understand the performance of matrix algorithm on different programming platform. For a 4000x4000 positive definite real matrix, the
import time
tic = time.perf_counter()
sts4098_inverse = np.linalg.inv(sts4098)
toc = time.perf_counter()
print(f"Numpy inversion in {toc - tic:0.4f} seconds")
Numpy inversion in 0.9607 seconds
uses around 1 second. I tried to use the CLAPACK interface contained in Apple's Accelerate framework like that
cout << "Using LAPACK to perform Cholesky Inversion... " << endl;
char* UPLO = "U";
int N, LDA, ret;
N = n;
LDA = n;
int len_chol = n * (n + 1) / 2;
double* ap = (double*)malloc(len_chol * sizeof(double));
system_clock::time_point t1, t2;
t1 = system_clock::now();
ret = dpotrf_(UPLO, &N, a, &LDA, &ret);
int ptr = 0;
// Column major expression in LAPACK!
for (int i = 0; i < n; i++) {
for (int j = 0; j <= i; j++) {
ap[ptr] = a[n * i + j];
ptr++;
}
}
ret = dpptri_(UPLO, &N, ap, &ret);
t2 = system_clock::now();
duration<double, milli> ms = t2 - t1;
double time = ms.count() / 1000;
cout << fixed;
cout << "LAPACK Cholesky Inversion takes " << time << "s" << endl;
return ap;
LAPACK Cholesky Inversion takes 5.539074s
And the algorithm uses more than 5s according to chrono::system_clock
. I think this result is impossible since I used the compiling option "-Ofast", "-mcpu=apple-m1". My laptop is an M1 Max model.
Is there anything that I missed? How is it possible that numpy is that much faster?