-2

I have n (very large) independent linear systems (Ax = b_i). They all have the same A, but b_i is different for (i = 1, ..., n). I want to solve these n systems in parallel in CUDA.

I was thinking that it might be most efficient to do the LU factorization of A in the host and then copy the new A to the constant memory of GPU (because even if I do the LU in device, only one thread should do it and other threads will be idle. Besides, constant memory is faster). Is there any better way for this?

Another issue is that while all threads are solving their system at the same time with the same algorithm, they are all accessing the same place of memory (A[i]) at the same time, which is not coalesced. How can I optimize this ?

talonmies
  • 70,661
  • 34
  • 192
  • 269
Farzad
  • 53
  • 5
  • 3
    "Is there any better way for this?" The CUSOLVER library (including the less obvious device methods which are depicted in CUDA sample codes) provides methods for handling problems like this. Rather than worry about low-level considerations such as coalescing which would come about through writing your own code, you might want to investigate a method using libraries. – Robert Crovella Mar 24 '19 at 14:50

1 Answers1

-3

(This is assuming A is an stably-invertible n x n matrix.)

Don't solve a much harder problem just because it seems to parallelize better

Let B be the matrix whose columns are b_1 ... b_n. Under our assumptions about A, you actually need to solve the equation A X = B for an n x n matrix of variables, i.e. your solution is A^{-1}B.

So basically you have one matrix inversion and one matrix multiplication. This holds regardless of what software and hardware you're going to use. For inversion and multiplication just use CUBLAS, or cuSparse, or cuSOLVER, or ArrayFire or whatever solves these things the fastest.

You could do both of them together I suppose, but I'm not sure there are optimizations for that).

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • 1
    This is mathematically misleading as best. Nobody, ever, performs "matrix inversion and multiplication" to solve a linear system, and that isn't what the question was proposing either – talonmies Mar 24 '19 at 06:37
  • @talonmies: Solving n different linear nxn linear systems is necessarily n^3 operations (just for doing anything with all elements of the matrix). Matrix inversion and matrix multiplication are n^2.37 or whatever the effective implemented optimum is there days. Nor sure what you mean by "mathematically misleading" - did I get something wrong, algebraically? – einpoklum Mar 24 '19 at 08:35
  • You're assuming that `A` is square and invertible. Why? – Rodrigo de Azevedo Mar 24 '19 at 14:23
  • `A` is normally square for solving an ordinary system of linear equations, and there is nothing that I see in the question that suggests otherwise. A non-square `A` implies either an underconstrained or overconstrained system, and these are (IMO) different problems, and usually involve different solution methods, and again, there is nothing in the question that suggests the problem being asked about is in one of these categories. – Robert Crovella Mar 24 '19 at 14:46
  • @RodrigodeAzevedo: What Robert said; plus, if A is not invertible, then there's no general solution to A x = b . It might still be solvable, of course. I'll add my assumption. – einpoklum Mar 24 '19 at 15:38
  • @RobertCrovella What is "normal" or not is highly dependent on the community to which one belongs. I would rather ask the OP for more information than to make nice assumptions. – Rodrigo de Azevedo Mar 24 '19 at 15:44
  • @einpoklum I agree with talonmies in the "mathematically misleading". There is no reason to invert a matrix numerically, even less to solve a linear system of equations, its not the way its done. It will likely lead to a more inaccurate results, in the cases where a valid result is obtainable, which is not always. I am quite sure all the libraries you mention do solve `Ax=b` without inverting the matrix. – Ander Biguri Mar 25 '19 at 09:32
  • @AnderBiguri: I explained the rationale for the inversion. What you describe would take Omega(n^3) time given n column vectors. – einpoklum Mar 25 '19 at 10:01
  • @einpoklum what I describe is an entire field of mathematics, not an algorithm. You can not describe it as O(n^3). You could also add 1 to all the values and it would be faster, but that does not make it correct. There are multiple tutorials that explain why inverting a matrix is a terrible idea numerically, and its not related to speed, but validity of the result. – Ander Biguri Mar 25 '19 at 10:12
  • @AnderBiguri: You're confusing O(*) notation with Omega(*) notation, which makes me suspect you may not have followed my explanation regarding the overall complexity of the computation. – einpoklum Mar 25 '19 at 10:22
  • @einpoklum I have indeed, but mainly because my point is that the discussion about using a matrix inverse has nothing to do whatsoever with computational complexity and its just about numerical stability. – Ander Biguri Mar 25 '19 at 11:07
  • @AnderBiguri: I agree that stability could be an issue and will add a note about that. However, it has _everything_ to do with time complexity. – einpoklum Mar 25 '19 at 11:22
  • @einpoklum it has to do with ill-possession, ill-conditioning and some other numerical problems matrices have. I wish I could invert my matrices in my research. They are too unstable to give correct results though. As Talomnies said in his first comment *"Nobody, ever, performs "matrix inversion and multiplication" to solve a linear system"*. This is true. [This is a good starting point to read about why](https://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/) – Ander Biguri Mar 25 '19 at 11:49