0

I try to use np.array.split to split the dataset into 2 part, but it does not work well

Hope some one can give some advice on this issue

x` (images tensor) and `y` (labels) should have the same length. Found: x.shape = (14218, 32, 32, 3), y.shape = (2, 7109, 10)

Code part

y_train = utils.to_categorical(y_train_data, number_of_classes)  # one-hot encoding
y_test = utils.to_categorical(y_test_data, number_of_classes)   # one-hot encoding
# 查看一个类别样本
print('对应类别为7\n', y_train[1])

'''clients_num = 2
X_train = np.array_split(X_train, clients_num)
y_train = np.array_split(y_train, clients_num)
print(np.shape(y_train))'''

input_shape = (img_rows, img_cols, 1)

rgb_batch = np.repeat(X_train_data[..., np.newaxis], 3, -1)
rgb_batch1 = np.repeat(X_test_data[..., np.newaxis], 3, -1)

X_train = tf.image.resize(rgb_batch, (32, 32))
X_test = tf.image.resize(rgb_batch1, (32, 32))

tf.dtypes.cast(X_train, tf.float32)
tf.dtypes.cast(X_test, tf.float32)

X_train /= 255.0
X_test /= 255.0
Jovan Mei
  • 15
  • 5

1 Answers1

0

If I understand correctly, you have X_train and Y_train that are numpy arrays representing your dataset. If you want to divide it in random parts, you could for instance shuffle the datasets and then take the first shuffled part for the 1st client and the second for the second client:

rand_indexes= np.arange(len(X_train))
np.random.shuffle(rand_indexes)
X_rand  = X_train[rand_indexes]
Y_rand  = Y_train[rand_indexes]
X_1_train = X_rand[0:num_samples_1]
Y_1_train = Y_rand[0:num_samples_1]
X_2_train = X_rand[num_samples_1:]
Y_2_train = Y_rand[num_samples_1:]
jackve
  • 309
  • 1
  • 11
  • Is the dataset you want to split a tensorflow dataset or numpy arrays? - “After changing the dimension, it is a numpy array” – Jovan Mei Feb 20 '21 at 12:09
  • Thanks for your explanation, I will try it. – Jovan Mei Feb 20 '21 at 12:09
  • It successfully split it into 2 parts, but when I put it into `model.fit`, it sends me "ValueError: Shapes (None, 1) and (None, 10) are incompatible" , I really don't know where the 10 come from? Do you have any ideas? – Jovan Mei Feb 20 '21 at 14:35
  • That depends on your model output layer. Make sure that the number of classes in your dataset is equal to the number of output classes of your model. It seems like they are diffferent (10 and 1) in your case – jackve Feb 20 '21 at 14:41
  • I check it, the `y_test` is (3550, 10), and the num_classes in the model is also 10, the error is still happen – Jovan Mei Feb 20 '21 at 15:10
  • Sorry to bother you, should I split the `X_test` into 2 part also, or stay the same compared with before – Jovan Mei Feb 20 '21 at 15:58