1

Cusolver has Cholesky decomposition, unlike CUBLAS. I see cusolverDnDpotrsBatched and cusolverDnDpotrfBatched, but unfortunately I can't seem to find cusolverDnDpotriBatched in the documentation.

Is there any way I can batch cusolverDnDpotri without massive overhead, or a way to do the equivalent of what the API would have done?

talonmies
  • 70,661
  • 34
  • 192
  • 269
Krupip
  • 4,404
  • 2
  • 32
  • 54
  • 2
    The really short answer is no – talonmies Jul 08 '19 at 15:45
  • @talonmies Is there a reason it wasn't included in the cusolver update as well? – Krupip Jul 08 '19 at 16:01
  • That question would have to be directed to NVIDIA. I am not them – talonmies Jul 08 '19 at 16:03
  • Writing a batched kernel is not as simple as adding `Batched` to the end of a routine. It's weeks or even months of developer time for doing it well and optimizing the code properly. Knowing NVIDIA, they focus their efforts on most pressing issues, since they (as any other company) do not have spare developer time to work on things not needed by their customers. – gflegar Jul 08 '19 at 16:05

1 Answers1

2

Unfortunately, the only way would be to write your own kernel, as there are no "automatic" ways to convert a non-batched kernel to a batched one (writing a well-performing batched version of a kernel is by itself a scientific paper that can easily get accepted to a high-profile HPC conference).

Are you sure you actually need the inverse? Operations with the inverse can usually be expressed as a solution of a linear system, for which you could be using cusolverDnPotrsBatched.

If you really need the inverse, the only way I can think of without the need to write CUDA code would be to call cusolverDnPotrsBatched with the right-hand sides Barray set to a batch of identity matrices. This way, the solutions Xi of systems Ai * Xi = I (which overwrite Barray) are the inverses of the matrix batch Aarray. It does need extra memory, and is not as efficient as writing a kernel for the inverse, but should be faster than doing it sequentially.

Another option would be to forget that the matrices are symmetric, and treat them as general matrices. You could then use the MAGMA library and its magma_dgetri_outoflace_batched() function to invert the matrices (again not in-place). Unfortunately, MAGMA also does not support the batched version of symmetric inverse.

gflegar
  • 1,583
  • 6
  • 22
  • right now if I went to solve for `x` solution, it would greatly decrease the performance of the program with out essentially having to re-write the cholesky solver. in `Ax = b`, `b` could be generated easily, but there are a very large amount of `b`s and an order of magnitude less `A`s to deal with. This means that it is critical that I *not* read `b`s from memory, but instead generate them on the fly. This would only be possible via writing my own solver. Replacing the non cholesky inverter with a cholesky one would potentially increase the performance with out many code changes. – Krupip Jul 08 '19 at 18:47