I'm consolidating my comments into one answer. It's not a solution, but editing is easier.
If you hope to find an efficient row shuffle regardless of sparse format, you have not studied the sparse matrix documentation enough. Only csr
and lil
store their data in row-oriented fashion.
I can imagine doing an in-place row shuffle with the lil
format. While csr
stores data in a row oriented manner, row shuffle will be more complicated, and difficult to do in-place.
Tracing through the scikit
shuffle, I see it just comes down to matrix[index,:]
(where index
is a sampling without replacement). That's the same as in the CSR link. For what it's worth, CSR indexing actually uses matrix-multiplication, using a specially constructed 'extractor' matrix.
Shuffling lists is relatively efficient, in-place or not, since it just involves creating a new list of pointers/references to the row lists. Row shuffle of a dense numpy array requires copying all the data. It can be done in compiled code, but it still requires enough buffer space for a whole copy.