How to re-write tensorflow code to make model training faster?

Question

QUESTION: My training is super slow. How do I rewrite my code to make my deep learning model training faster?

BACKGROUND: I have built a CNN with TensorFlow 2.8.1 to classify CIFAR-100 images using a custom loss function. The CIFAR dataset includes 32x32-pixel RGB images of 100 fine classes (e.g., bear, car) categorized into 20 coarse classes (e.g., large omnivore, vehicle). My custom loss function is a weighted sum of two other loss functions (see code below). The first component is the crossentropy loss for the fine label. The second component is the crossentropy loss for the coarse label. My hope is that this custom loss function will enforce accurate classification of the coarse label to get a more accurate classifications of the fine label (fingers crossed). The comparator will be crossentropy loss of just the fine label (the baseline model). Note that to derive the coarse (hierarchical) loss component, I had to map the y_true (true fine label, integer) and y_pred (predicted softmax probabilities for the fine labels, vector) to the y_true_coarse_int (true coarse label, integer) and y_pred_coarse_hot (predicted coarse label, one hot encoded vector), respectively. FineInts_to_CoarseInts is a python dictionary that allows this mapping.

The training takes >5-hours to run with the custom loss function, whereas training with regular crossentropy loss for the fine classes takes ~1hr. Code was run on a high performance computing cluster with a 32GB CPU and 1 GPU.

See below:

# THIS CODE CELL IS TO DEFINE A CUSTOM LOSS FUNCTION

def crossentropy_loss(y_true, y_pred):
    return SparseCategoricalCrossentropy()(y_true, y_pred)

def hierarchical_loss(y_true, y_pred):
    y_true = tensorflow.cast(y_true, dtype=float)
    y_true_reshaped = tensorflow.reshape(y_true,  -1)
    y_true_coarse_int = [FineInts_to_CoarseInts[K.eval(y_true_reshaped[i])] for i in range(y_true_reshaped.shape[0])]
    y_true_coarse_int = tensorflow.cast(y_true_coarse_int, dtype=tensorflow.float32)

    y_pred = tensorflow.cast(y_pred, dtype=float)
    y_pred_int = tensorflow.argmax(y_pred, axis=1)
    y_pred_coarse_int = [FineInts_to_CoarseInts[K.eval(y_pred_int[i])] for i in range(y_pred_int.shape[0])]
    y_pred_coarse_int = tensorflow.cast(y_pred_coarse_int, dtype=tensorflow.float32)
    y_pred_coarse_hot = to_categorical(y_pred_coarse_int, 20)

    return SparseCategoricalCrossentropy()(y_true_coarse_int, y_pred_coarse_hot)

def custom_loss(y_true, y_pred):
    H = 0.5
    total_loss = (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred)
    return total_loss

During model compilation I had to set the run_eagerly parameter to True. See below:

# THIS CODE CELL IS TO COMPILE THE MODEL

model.compile(optimizer="adam", loss=custom_loss, metrics="accuracy", run_eagerly=True)

The full code is below:

# THIS CODE CELL LOADS THE PACKAGES USED IN THIS NOTEBOOK

# Load core packages for data analysis and visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
import sys

!{sys.executable} -m pip install pydot
!{sys.executable} -m pip install graphviz

# Load deep learning packages
import tensorflow                                                                           
from tensorflow.keras.datasets.cifar100 import load_data      
from tensorflow.keras import (Model, layers)          
from tensorflow.keras.losses import SparseCategoricalCrossentropy
import tensorflow.keras.backend as K
from tensorflow.keras.utils import (to_categorical, plot_model)
from tensorflow.lookup import (StaticHashTable, KeyValueTensorInitializer)

# Load model evaluation packages
import sklearn
from sklearn.metrics import (confusion_matrix, classification_report)

# Print versions of main ML packages
print("Tensorflow version " + tensorflow.__version__)
print("Scikit learn version " + sklearn.__version__)

# THIS CODE CELL LOADS DATASETS AND CHECKS DATA DIMENSIONS

# There is an option to load the "fine" (100 fine classes) or "coarse" (20 super classes) labels with integer (int) encodings
# We will load both labels for hierarchical classification tasks
(x_train, y_train_fine_int), (x_test, y_test_fine_int) = load_data(label_mode="fine")
(_, y_train_coarse_int), (_, y_test_coarse_int) = load_data(label_mode="coarse")

# EXTRACT DATASET PARAMETERS FOR USE LATER ON
num_fine_classes = 100
num_coarse_classes = 20
input_shape = x_train.shape[1:]  

# THIS CODE CELL PROVIDES THE CODE TO LINK INTEGER LABELS TO MEANINGFUL WORD LABELS
# Fine and coarse labels are provided as integers.  We will want to link them both to meaningful world labels.



# CREATE A DICTIONARY TO MAP THE 20 COARSE LABELS TO THE 100 FINE LABELS

# This mapping comes from https://keras.io/api/datasets/cifar100/ 
# Except "computer keyboard" should just be "keyboard" for the encoding to work
CoarseLabels_to_FineLabels = {
    "aquatic mammals":                  ["beaver", "dolphin", "otter", "seal", "whale"],
    "fish":                             ["aquarium fish", "flatfish", "ray", "shark", "trout"],
    "flowers":                          ["orchids", "poppies", "roses", "sunflowers", "tulips"],
    "food containers":                  ["bottles", "bowls", "cans", "cups", "plates"],
    "fruit and vegetables":             ["apples", "mushrooms", "oranges", "pears", "sweet peppers"],
    "household electrical devices":     ["clock", "keyboard", "lamp", "telephone", "television"],
    "household furniture":              ["bed", "chair", "couch", "table", "wardrobe"],
    "insects":                          ["bee", "beetle", "butterfly", "caterpillar", "cockroach"],
    "large carnivores":                 ["bear", "leopard", "lion", "tiger", "wolf"],
    "large man-made outdoor things":    ["bridge", "castle", "house", "road", "skyscraper"],
    "large natural outdoor scenes":     ["cloud", "forest", "mountain", "plain", "sea"],
    "large omnivores and herbivores":   ["camel", "cattle", "chimpanzee", "elephant", "kangaroo"],
    "medium-sized mammals":             ["fox", "porcupine", "possum", "raccoon", "skunk"],
    "non-insect invertebrates":         ["crab", "lobster", "snail", "spider", "worm"],
    "people":                           ["baby", "boy", "girl", "man", "woman"],
    "reptiles":                         ["crocodile", "dinosaur", "lizard", "snake", "turtle"],
    "small mammals":                    ["hamster", "mouse", "rabbit", "shrew", "squirrel"],
    "trees":                            ["maple", "oak", "palm", "pine", "willow"],
    "vehicles 1":                       ["bicycle", "bus", "motorcycle", "pickup" "truck", "train"],
    "vehicles 2":                       ["lawn-mower", "rocket", "streetcar", "tank", "tractor"]
}

# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED COARSE LABEL TO THE WORD LABEL
# Create list of Course Labels
CoarseLabels = list(CoarseLabels_to_FineLabels.keys())

# The target variable in CIFER100 is encoded such that the coarse class is assigned an integer based on its alphabetical order
# The CoarseLabels list is already alphabetized, so no need to sort
CoarseInts_to_CoarseLabels = dict(enumerate(CoarseLabels))

# CREATE A DICTIONARY TO MAP THE WORD LABEL TO THE INTEGER-ENCODED COARSE LABEL
CoarseLabels_to_CoarseInts = dict(zip(CoarseLabels, range(20)))


# CREATE A DICTIONARY TO MAP THE 100 FINE LABELS TO THE 20 COARSE LABELS
FineLabels_to_CoarseLabels = {}
for CoarseLabel in CoarseLabels:
    for FineLabel in CoarseLabels_to_FineLabels[CoarseLabel]:
        FineLabels_to_CoarseLabels[FineLabel] = CoarseLabel
        
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED FINE LABEL TO THE WORD LABEL
# Create a list of the Fine Labels
FineLabels = list(FineLabels_to_CoarseLabels.keys())

# The target variable in CIFER100 is encoded such that the fine class is assigned an integer based on its alphabetical order
# Sort the fine class list.  
FineLabels.sort()
FineInts_to_FineLabels = dict(enumerate(FineLabels))


# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED FINE LABELS TO THE INTEGER-ENCODED COARSE LABELS
b = list(dict(sorted(FineLabels_to_CoarseLabels.items())).values())
FineInts_to_CoarseInts = dict(zip(range(100), [CoarseLabels_to_CoarseInts[i] for i in b]))

#Tensor version of dictionary
#fine_to_coarse = tensorflow.constant(list((FineInts_to_CoarseInts).items()), dtype=tensorflow.int8)




# THIS CODE CELL IS TO BUILD A FUNCTIONAL MODEL

inputs = layers.Input(shape=input_shape)
x = layers.BatchNormalization()(inputs)

x = layers.Conv2D(64, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)

x = layers.Conv2D(256, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)

x = layers.Conv2D(256, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)

x = layers.Conv2D(1024, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)

x = layers.GlobalAveragePooling2D()(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.30)(x)

x = layers.Dense(512, activation = "relu")(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.30)(x)

output_fine = layers.Dense(num_fine_classes, activation="softmax", name="output_fine")(x)

model = Model(inputs=inputs, outputs=output_fine)



# THIS CODE CELL IS TO DEFINE A CUSTOM LOSS FUNCTION

def crossentropy_loss(y_true, y_pred):
    return SparseCategoricalCrossentropy()(y_true, y_pred)

def hierarchical_loss(y_true, y_pred):
    y_true = tensorflow.cast(y_true, dtype=float)
    y_true_reshaped = tensorflow.reshape(y_true,  -1)
    y_true_coarse_int = [FineInts_to_CoarseInts[K.eval(y_true_reshaped[i])] for i in range(y_true_reshaped.shape[0])]
    y_true_coarse_int = tensorflow.cast(y_true_coarse_int, dtype=tensorflow.float32)

    y_pred = tensorflow.cast(y_pred, dtype=float)
    y_pred_int = tensorflow.argmax(y_pred, axis=1)
    y_pred_coarse_int = [FineInts_to_CoarseInts[K.eval(y_pred_int[i])] for i in range(y_pred_int.shape[0])]
    y_pred_coarse_int = tensorflow.cast(y_pred_coarse_int, dtype=tensorflow.float32)
    y_pred_coarse_hot = to_categorical(y_pred_coarse_int, 20)

    return SparseCategoricalCrossentropy()(y_true_coarse_int, y_pred_coarse_hot)

def custom_loss(y_true, y_pred):
    H = 0.5
    total_loss = (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred)
    return total_loss




# THIS CODE CELL IS TO COMPILE THE MODEL

model.compile(optimizer="adam", loss=crossentropy_loss, metrics="accuracy", run_eagerly=False)


# THIS CODE CELL IS TO TRAIN THE MODEL

history = model.fit(x_train, y_train_fine_int, epochs=200, validation_split=0.25, batch_size=100)


# THIS CODE CELL IS TO VISUALIZE THE TRAINING

history_frame = pd.DataFrame(history.history)
history_frame.to_csv("history.csv")
history_frame.loc[:, ["accuracy", "val_accuracy"]].plot()
history_frame.loc[:, ["loss", "val_loss"]].plot()
plt.show()


# THIS CODE CELL IS TO EVALUATE THE MODEL ON AN INDEPENDENT DATASET

score = model.evaluate(x_test, y_test_fine_int, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

slow and fast have no meaning here, you need to use numbers to define those. — Dr. Snoopy, Sep 20 '22 at 20:47
you need to use a GPU. A model of this size will be slow on cpu — AndrzejO, Sep 20 '22 at 21:14
@Dr.Snoopy - the absolute training times are now specified for custom and "baseline" loss functions. — Snehal Patel, Sep 20 '22 at 21:25
@AndrzejO - the hardware used is now indicated (32GB CPU + 1 GPU) — Snehal Patel, Sep 20 '22 at 21:26
@SnehalPatel My guess is that your tensorflow installation does not use the GPU. On google colab, your model takes a few sec per epoch — AndrzejO, Sep 20 '22 at 21:31
@AndrzejO, in the full code I provided, I realized that the model.fit method had crossentropy_loss, not custom_loss. That is why it was taking only a few seconds per epoch. With custom_loss it takes about 3X longer. — Snehal Patel, Sep 20 '22 at 23:47
@SnehalPatel With custom_loss your code gives an error message — AndrzejO, Sep 21 '22 at 07:23

score 0 · Answer 1 · answered Sep 21 '22 at 08:15

Quantization

Quantization is the technique that converts your number type float32 to int8. It means your model size will be lesser.
There are two types of quantization before training and after training.
Try to apply quantization before training and let me know the results.

Refer to this video for Quantization

How to re-write tensorflow code to make model training faster?

1 Answers1

Quantization