Suppose you got a huge (40+ GB) feature value (floating-point) matrix, rows are different features and columns are the samples/images.
The table is precomputed column-wise. Then it is completely accessed row-wise and multi-threaded (each thread loads a whole row) several times.
What would be the best way to handle this matrix? I'm especially pondering over 5 points:
- Since it's run on an x64 PC I could memory map the whole matrix at once but would that make sense?
- What about the effects of multithreading (multithreaded initial computation as well?)?
- How to layout the matrix: row or column major?
- Would it help to mark the matrix as read-only after the precomputation has been finished?
- Could something like http://www.kernel.org/doc/man-pages/online/pages/man2/madvise.2.html be used to speed it up?