Unfortunately, the only way would be to write your own kernel, as there are no "automatic" ways to convert a non-batched kernel to a batched one (writing a well-performing batched version of a kernel is by itself a scientific paper that can easily get accepted to a high-profile HPC conference).
Are you sure you actually need the inverse? Operations with the inverse can usually be expressed as a solution of a linear system, for which you could be using cusolverDnPotrsBatched
.
If you really need the inverse, the only way I can think of without the need to write CUDA code would be to call cusolverDnPotrsBatched
with the right-hand sides Barray
set to a batch of identity matrices. This way, the solutions Xi of systems Ai * Xi = I
(which overwrite Barray
) are the inverses of the matrix batch Aarray
. It does need extra memory, and is not as efficient as writing a kernel for the inverse, but should be faster than doing it sequentially.
Another option would be to forget that the matrices are symmetric, and treat them as general matrices. You could then use the MAGMA library and its magma_dgetri_outoflace_batched()
function to invert the matrices (again not in-place). Unfortunately, MAGMA also does not support the batched version of symmetric inverse.