1

I am learning Python and solving a machine learning problem.

class_ids=np.arange(self.x.shape[0])
np.random.shuffle(class_ids)
self.x=self.x[class_ids]

This is a shuffle function in NumPy but I can't understand what self.x=self.x[class_ids] means. because I think it gives the value of the array to a variable.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
Charles_XC
  • 15
  • 7
  • 1
    Print `self.x` before and after the assignment statement. What does it do? – wwii Aug 24 '17 at 02:56
  • [Integer array indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer-array-indexing) – wwii Aug 24 '17 at 02:59
  • 1
    What is `self.x`? Can you show its initialization? – Paul Rooney Aug 24 '17 at 03:04
  • 1
    The intent of the code seems to be to shuffle `self.x` in which case I would assume `np.random.shuffle(self.x)` would suffice – Some Guy Aug 24 '17 at 03:13
  • the code above is the basic step of loading omniglot Dataset and try to shuffle it for training or test and self.x is the data of those pictures which are shaped like self.x = np.reshape(self.x, newshape=(1622, 20, 28, 28, 1)) @PaulRooney – Charles_XC Aug 24 '17 at 06:33

2 Answers2

1

It's a very complicated way to shuffle the first dimension of your self.x. For example:

>>> x = np.array([[1, 1], [2, 2], [3, 3], [4, 4], [5, 5]])
>>> x
array([[1, 1],
       [2, 2],
       [3, 3],
       [4, 4],
       [5, 5]])

Then using the mentioned approach

>>> class_ids=np.arange(x.shape[0])  # create an array [0, 1, 2, 3, 4]
>>> np.random.shuffle(class_ids)     # shuffle the array
>>> x[class_ids]                     # use integer array indexing to shuffle x
array([[5, 5],
       [3, 3],
       [1, 1],
       [4, 4],
       [2, 2]])

Note that the same could be achieved just by using np.random.shuffle because the docstring explicitly mentions:

This function only shuffles the array along the first axis of a multi-dimensional array. The order of sub-arrays is changed but their contents remains the same.

>>> np.random.shuffle(x)
>>> x
array([[5, 5],
       [3, 3],
       [1, 1],
       [2, 2],
       [4, 4]])

or by using np.random.permutation:

>>> class_ids = np.random.permutation(x.shape[0])  # shuffle the first dimensions indices
>>> x[class_ids]
array([[2, 2],
       [4, 4],
       [3, 3],
       [5, 5],
       [1, 1]])
MSeifert
  • 145,886
  • 38
  • 333
  • 352
0

Assuming self.x is a numpy array:

class_ids is a 1-d numpy array that is being used as an integer array index in the expression: x[class_ids]. Because the previous line shuffled class_ids, x[class_ids] evaluates to self.x shuffled by rows. The assignment self.x=self.x[class_ids] assigns the shuffled array to self.x

wwii
  • 23,232
  • 7
  • 37
  • 77