3

Given a matrix A, I want to apply different random shuffles for different row of A; for example,

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

becomes

array([[1, 3, 2],
       [6, 5, 4],
       [7, 9, 8]])

Of course we can loop through the matrix and make every row randomly shuffle; however iteration is slow and I am asking if there is more efficient way to do this.

cs95
  • 379,657
  • 97
  • 704
  • 746
Tony
  • 1,225
  • 3
  • 12
  • 26
  • Another answer [here](https://stackoverflow.com/questions/21010947/fast-column-shuffle-of-each-row-numpy). The comments there also suggest `apply_along_axis `. Another answer for columns is [here](https://stackoverflow.com/questions/26975807/efficient-way-to-shuffle-one-column-at-the-time-in-numpy-matrix) and [here](https://stackoverflow.com/questions/20546419/shuffle-columns-of-an-array-with-numpy) and [here](https://stackoverflow.com/questions/36272992/numpy-random-shuffle-by-row-independently) – Sheldore Jun 10 '19 at 23:11
  • And one more [here for column as well](https://stackoverflow.com/questions/35646908/numpy-shuffle-multidimensional-array-by-row-only-keep-column-order-unchanged) – Sheldore Jun 10 '19 at 23:15

2 Answers2

5

Picked up this neat trick from Divakar which involves randn and argsort:

np.random.seed(0)

s = np.arange(16).reshape(4, 4)
np.take_along_axis(s, np.random.randn(*s.shape).argsort(axis=1), axis=1)

array([[ 1,  0,  3,  2],
       [ 4,  6,  5,  7],
       [11, 10,  8,  9],
       [14, 12, 13, 15]])

For a 2D array, this can be simplified to

s[np.arange(len(s))[:,None], np.random.randn(*s.shape).argsort(axis=1)]

array([[ 1,  0,  3,  2],
       [ 4,  6,  5,  7],
       [11, 10,  8,  9],
       [14, 12, 13, 15]])

You can also apply np.random.permutation over each row independently to return a new array.

np.apply_along_axis(np.random.permutation, axis=1, arr=s)

array([[ 3,  1,  0,  2],
       [ 4,  6,  5,  7],
       [ 8,  9, 10, 11],
       [15, 14, 13, 12]])

Performance -

s = np.arange(10000 * 100).reshape(10000, 100) 

%timeit s[np.arange(len(s))[:,None], np.random.randn(*s.shape).argsort(axis=1)] 
%timeit np.apply_along_axis(np.random.permutation, 1, s)   

84.6 ms ± 857 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
842 ms ± 8.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I've noticed it depends on the dimensions of your data, make sure to test it out first.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thanks! So if I got a 3D array and if I want to permute the last dimension, then I can do `np.take_along_axis(s, np.random.randn(*s.shape).argsort(axis=2), axis=2)`, right? – Tony Jun 10 '19 at 23:28
  • @Tony Yes, I think that should work. – cs95 Jun 10 '19 at 23:30
0

Codewise you can use numpy's apply_along_axis as

np.apply_along_axis(np.random.shuffle, 1, matrix)

but it doesn't seem to be more efficient than iterating at least for a 3x3 matrix, for that method I get

> %%timeit 
> np.apply_along_axis(np.random.shuffle, 1, test)
67 µs ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

while the iteration gives

> %%timeit
> for i in range(test.shape[0]):
>     np.random.shuffle(test[i])
20.3 µs ± 284 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
vlizana
  • 2,962
  • 1
  • 16
  • 26
  • `apply_along_axis` is essentially just iterate over the 'other' axes. No speed promises. It makes iteration prettier for 3d and larger; does nothing for 2d. – hpaulj Jun 10 '19 at 23:04