Do not understand the classes part and reshape from reading a h5 dataset file

Question

Hello can somebody explain step by step what's hapening in following code? Escpecially the part classes and the reshape? tnx

def load_data():
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes

    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))

    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

score 1 · Answer 1 · answered Jun 07 '18 at 06:56

Most of the lines just load datasets from the h5 file. The np.array(...) wrapper isn't needed. test_dataset[name][:] is sufficient to load an array.

test_set_y_orig = test_dataset["test_set_y"][:]

test_dataset is the opened file. test_dataset["test_set_y"] is a dataset on that file. The [:] loads the dataset into a numpy array. Look up the h5py docs for more details on load a dataset.

I deduce from

train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))

that the array, as loaded is 1d, with shape (n,), and this reshape is just adding an initial dimension, making it (1,n). I would have coded it as

train_set_y_orig = train_set_y_orig[None,:]

but the result is the same.

There's nothing special about the classes array (though it might well be an array of strings).

Do not understand the classes part and reshape from reading a h5 dataset file

1 Answers1