What does x=x[class_id] do when used on NumPy arrays

Question

I am learning Python and solving a machine learning problem.

class_ids=np.arange(self.x.shape[0])
np.random.shuffle(class_ids)
self.x=self.x[class_ids]

This is a shuffle function in NumPy but I can't understand what self.x=self.x[class_ids] means. because I think it gives the value of the array to a variable.

Print `self.x` before and after the assignment statement. What does it do? — wwii, Aug 24 '17 at 02:56
[Integer array indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer-array-indexing) — wwii, Aug 24 '17 at 02:59
The intent of the code seems to be to shuffle `self.x` in which case I would assume `np.random.shuffle(self.x)` would suffice — Some Guy, Aug 24 '17 at 03:13
the code above is the basic step of loading omniglot Dataset and try to shuffle it for training or test and self.x is the data of those pictures which are shaped like self.x = np.reshape(self.x, newshape=(1622, 20, 28, 28, 1)) @PaulRooney — Charles_XC, Aug 24 '17 at 06:33

score 1 · Answer 1 · answered Aug 24 '17 at 05:17

It's a very complicated way to shuffle the first dimension of your self.x. For example:

>>> x = np.array([[1, 1], [2, 2], [3, 3], [4, 4], [5, 5]])
>>> x
array([[1, 1],
       [2, 2],
       [3, 3],
       [4, 4],
       [5, 5]])

Then using the mentioned approach

>>> class_ids=np.arange(x.shape[0])  # create an array [0, 1, 2, 3, 4]
>>> np.random.shuffle(class_ids)     # shuffle the array
>>> x[class_ids]                     # use integer array indexing to shuffle x
array([[5, 5],
       [3, 3],
       [1, 1],
       [4, 4],
       [2, 2]])

Note that the same could be achieved just by using np.random.shuffle because the docstring explicitly mentions:

This function only shuffles the array along the first axis of a multi-dimensional array. The order of sub-arrays is changed but their contents remains the same.

>>> np.random.shuffle(x)
>>> x
array([[5, 5],
       [3, 3],
       [1, 1],
       [2, 2],
       [4, 4]])

or by using np.random.permutation:

>>> class_ids = np.random.permutation(x.shape[0])  # shuffle the first dimensions indices
>>> x[class_ids]
array([[2, 2],
       [4, 4],
       [3, 3],
       [5, 5],
       [1, 1]])

score 0 · Accepted Answer · answered Aug 24 '17 at 03:09

Assuming self.x is a numpy array:

class_ids is a 1-d numpy array that is being used as an integer array index in the expression: x[class_ids]. Because the previous line shuffled class_ids, x[class_ids] evaluates to self.x shuffled by rows. The assignment self.x=self.x[class_ids] assigns the shuffled array to self.x

What does x=x[class_id] do when used on NumPy arrays

2 Answers2