Solving a large (n>1000000) linear system of equations

Question

I have reached my limit on the following problem:

As part of my FEA code (in MATLAB) I need to find x, x=A\b. Both A and b are sparse, complex, double precision matrix and vector respectively. The size of A is (n,n) and b is (n,1) where n is 850000 and can increase to up 2000000. In addition, A is symmetric and mostly diagonal.

I have two HPC clusters to my disposal, one with 5.7TB of RAM and the other with 1.5TB (but faster cores). Technically speaking I can solve the system as is, and just wait for approximately 15days. However, I need to perform solve the system of equations up to 10x per simulations. Therefore, 15days is not an acceptable amount of time.

I tried iterative methods however these did not yield the same results as the backslash operator. Also in some cases convergence was not obtained.

I have converted the x=A\b part into a mexa64-format to potentially reduce the time. However, I fear it will still take days.

Any suggestions on how to tackle this problem? Are there any commercial codes that can do this faster/more efficient? And how commercial FEA packages solve this problem when a model has over 1m nodes?

Any help in much appreciated! Thanks in advance.

Once you get to this level, you need problem-specific code to accelerate. Often, large scale problems do not store the matrix `A` in memory, and this can come with a tremendous speed improvement. This can be done because the operations `A*x` and `A^t*b¬ can be computed on the fly. e.g. I made a tomography toolbox that computes these operations on the fly, which means that there is no memory requirement, and an acceleration of more than 1000 from a matrix storing solution. But it only works for a particular type of tomography. You need to find that for your case — Ander Biguri, Feb 13 '20 at 15:27
BTW, matlabs `A\b` uses highly accelerated code under the hood, I doubt you can go faster than what MATLAB gives using the same method — Ander Biguri, Feb 13 '20 at 15:28
You need to solve several systems of equations. Is the `A` matrix common to some of them by any chance? Any structure or restrictions that can be used? Otherwise, I agree with Ander: it seems hard to beat Matlab's speed in solving linear systems — Luis Mendo, Feb 13 '20 at 15:37
Is there any "structure" in the `A` matrix? Is it symmetric, triangular, orthogonal, etc? Sometimes the solution can be computed without inverting the matrix, but using special matrix factorizations; although `A\b` does not really invert the `A` matrix either. — Kavka, Feb 13 '20 at 16:39
Yes, the matrix A is symmetric, and mostly diagonal. I have updated the question accordingly as well. I understand that A\b is an extreme efficient manner to solve the problem in MATLAB. However, how do commercial FEA package solve it when they're dealing with such problem. — user5489, Feb 13 '20 at 17:14
Haven't used FEA for some years, but don't they all use approximation when it comes to such large numbers of nodes? — Daniel, Feb 13 '20 at 19:09
If you need more speed, you're probably going to have to go with an iterative solver. If whatever iterative solver you've tried was converging too slowly or not at all (even though it's appropriate for the type of matrix you have), then you need to find a better preconditioner. That's dependent on the structure of the problem, type of matrix, and any domain knowledge that you can bring to bear. — bg2b, Feb 13 '20 at 19:41
Sparse and wavefront solvers can manage it without storing the zeroes. Maybe that's a better solution for you. — duffymo, Feb 13 '20 at 19:57
@user5489, this will really have possibly 0 effect, but don't make `b` sparse, if it isn't sparse. If it is sparse... then there's room for approaching the problem another way. I agree with Ander Biguri: at this level there's no general solution. — Enlico, Feb 13 '20 at 20:07
I suggest reading about solving matrix-free linear systems of equations, using iterative solvers. Like Ander Biguri wrote, you don't even have to assemble your global matrix. If I'm not mistaken, ANSYS does this, but don't quote me on that! Also, [PETSc](https://www.mcs.anl.gov/petsc/) has many super efficient direct-solvers and iterative solvers for C++ and Fortran, and there is even a port for Python, called [petsc4py](https://petsc4py.readthedocs.io/en/stable/). It isn't plug and play, though. Both options - matrix-free and PETSc - allow parallelization. PETSc can use MUMPS too. — Breno, Apr 30 '20 at 20:32

score 1 · Answer 1 · answered Jun 07 '20 at 11:19

If your matrix is Hermitian (Aij = Aji*) and positive definite, then you can't go wrong with a conjugate gradient approach. That is almost always the fastest method if you have a single right hand side and is guaranteed to converge. Additionally, this allows you to use a guess solution (for instance, from a previous time step) which can be beneficial. While the solution is different from a direct decomposition (how could it not be with a gazillion coefficients), its probably not that different.

If it's literally a symmetric complex matrix (Aij = Aji), then that is very weird but maybe still better than a general complex matrix. In either case, try using a routine that only requires you to pass half of the matrix (I don't think matlab supports this functionality). That will cut your memory usage in half and might make things a bit faster (moving 16 GB worth of numbers is slower than 8GB).

Finally, I can't find an answer on whether Matlab supports the Intel MKL Inspector-executor sparse BLAS routines. In my experience, using MKL in my bottlenecks always increases performance. It's closed source and you have no idea whats happening in there -- probably magic.

score 0 · Answer 2 · answered Feb 14 '20 at 06:00

The sparse matrix computation is a big area itself. Only thing you can do is to get more specific in it and use a more customizable solution, as Matlab do not allow you to do so and see it's source code and see if you can do any improvement. As i know mainly there are two methods for solving a sparse linear system, Direct and Iterative. For each type many algorithms are there. For example Cholesky Decomposition have a very good performance and low memory usage, but only works on SPD (symmetric Positive Definite) matrices. other methods are LU, LDL, QR etc. Each one meet specific matrix types to solve.

Matlab also uses one of these methods behind the scene (not sure but I've read somewhere that it have used SuiteSparse code underneath the A/b operator). Based on type of your matrix (whether it is Symmetric Positive Definite or Non-symetric etc.) you need to choose one schema (i mean either Iterative or Direct). I think Direct methods suits better for such large system (because of rounding errors). Also there are GPU based solvers available for both direct and iterative methods and highly parallelized, but maybe not suitable for your big problem where matrix being factorized is more than 8GB itself.

Steps you should do:

Identify the type of your matrix (SPD, Symmetric or general).
Choose best Direct method for you matrix.
Find a free or commercial package that can handle such factorization (examples are Eigen, Intel MKL, SuitSpare) and have optimized code also support parallel computation.
Make your hand a little dirty, write a very small program that takes the matrix A and vector b from a file and find x using one of these packages with for example C++ code (they have ready to use example codes).

Also pls keep in mind that each method and package can change the ordering of matrix before starting computation and choosing a good ordering can have a big effect on speed or even accuracy.

This link should be also interesting:

Do not invert that matrix

Also two books on sparse matrices:

Sparse Matrix Technology by Sergio Pissanetzky
Direct Methods for Sparse Linear Systems by Timothy A. Davis

Solving a large (n>1000000) linear system of equations

2 Answers2