I have a large (several terabytes) file on disk that represents a matrix of 64-bit floating point numbers. I have a Python script that performs linear algebra operations on a subset of the rows of this matrix.
When using NumPy, I use numpy.memmap to load the file into a NumPy array and then operate on the appropriate rows of this array. However, it is a bit slow, and I now want to use CuPy to do the same task on a GPU to speed things up. I don't see the equivalent of a numpy.memmap for CuPy though.
What are the best practices for this scenario when it comes to CuPy? I have come across the concepts of "pinned memory" and "zero copy memory" when trying to look this up, but I'm not sure if those are the appropriate solutions for the problem at hand.