I need to perform SVD (singular value decomposition) on a multi-dimensional matrix.
The matrix can have the shape like [31329, 128, 36].
If you look up cupy.linalg.svd, you will find that SVD needs to be broadcasted to all 31329 2D-matrices of the shape [128, 36].
Therefore, I wonder if we could further speed up the SVD computation in cupy?
Thank you in advance for your time and help!
I am using GPU A100 for this matrix, it takes quite long.
I want to further accelerate the SVD computation in cupy.