15

l have two numpy arrays the first one contains data and the second one contains labels. l want to shuffle the data with respect to their labels. In other way, how can l shuffle my labels and data in the same order.

import numpy as np
data=np.genfromtxt("dataset.csv", delimiter=',')
classes=np.genfromtxt("labels.csv",dtype=np.str , delimiter='\t')

x=np.random.shuffle(data)
y=x[classes]

do this preserves the order of shuffling ?

vincent
  • 1,558
  • 4
  • 21
  • 34
  • You can combine data and class labels together, shuffle them [The order is preserved] and then separate them as input x and label y. – iun1x Aug 15 '19 at 22:14

4 Answers4

34

Generate a random order of elements with np.random.permutation and simply index into the arrays data and classes with those -

idx = np.random.permutation(len(data))
x,y = data[idx], classes[idx]
Divakar
  • 218,885
  • 19
  • 262
  • 358
1

the better and easy way is to use sklearn

from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)
Ali karimi
  • 371
  • 3
  • 10
0

Alternatively you can concatenate the data and labels together, shuffle them and then separate them into input x and label y as shown below:

def read_data(filename, delimiter, datatype): # Read data from a file
    return = np.genfromtxt(filename, delimiter, dtype= datatype)

classes = read_data('labels.csv', dtype= np.str , delimiter='\t')
data = read_data('data.csv', delimiter=',')
dataset = np.r_['1', data, classes] # Concatenate along second axis

def dataset_shuffle(dataset): # Returns separated shuffled data and classes from dataset 
    np.random.shuffle(dataset)
    n, m = dataset.shape
    x = data[:, 0:m-1]
    y = data[:, m-1]
    return x, y # Return shuffled x and y with preserved order
iun1x
  • 1,033
  • 11
  • 12
0

You can use zip function

import numpy as np
data=np.genfromtxt("dataset.csv", delimiter=',')
classes=np.genfromtxt("labels.csv",dtype=np.str , delimiter='\t')

temp = list(zip(data, classes))
np.random.shuffle(temp)
data, classes = zip(*temp)