0

I'm trying to figure out how to split the data based on these conditions in order to run a CNN on this:

Split the training/testing dataset into two sets: one with class labels < 5 and one with class labels >= 5. Print out the shapes of the resulting two sets from both training and testing datasets.

import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.utils import to_categorical
from tensorflow import keras

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

The above code is how I'm loading in the data. And the below is how I'm interpreting it but I'm not sure I'm doing it right given the training images still have a shape of (50000,32,32,3). Was wondering if anyone can help me figure this out.

train_labels_first = train_labels[train_labels < 5]
test_labels_first = test_labels[test_labels < 5]


train_labels_second = train_labels[train_labels >= 5]
test_labels_second = test_labels[test_labels >= 5]
runner16
  • 87
  • 13

1 Answers1

0

Just apply a boolean indexing on your train and test images. For example

train_images_first = train_images[train_labels[train_labels < 5]]
test_images_first = test_images[test_labels[test_labels < 5]]

print(train_images_first.shape, test_images_first.shape)
>>> (25000, 32, 32, 3) (5000, 32, 32, 3)

to get the labels just assign train_labels[train_labels < 5] to a new variable that holds labels up to value 5.

Miguel Trejo
  • 5,913
  • 5
  • 24
  • 49