How to label the loss values in Keras binary-crossentropy model

Question

I have a keras model for my data X. The code used is:

X=np.array(data[['tags1','prx1','prxcol1','p1','p2','p3']].values)
t=np.array(data.read.values)
n=np.array(data.read.values)

import keras

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
import tensorflow as tf

from sklearn.model_selection import train_test_split
X_train, X_test, t_train, t_test =  train_test_split(X, t, test_size=0.2)

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import Normalizer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler

standard_transformer = Pipeline(steps=[
        ('standard', StandardScaler())])

minmax_transformer = Pipeline(steps=[
        ('minmax', MinMaxScaler())])

preprocessor = ColumnTransformer(
        remainder='passthrough', #passthough features not listed
        transformers=[
            ('std', standard_transformer , []),
            ('mm', minmax_transformer , slice(1,9))
        ])

X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)

model = keras.models.Sequential([
    keras.layers.Dense(20, activation="tanh", input_shape=X_train.shape[1:]),
    keras.layers.Dense(1, activation="sigmoid")
])

model.summary

try: 
    model = keras.models.load_model("modelrfidX1.h5") # cargar modelo
except:
    pass

early_stopping_cb = keras.callbacks.EarlyStopping(patience=100, restore_best_weights=True)
checkpoint_cb = keras.callbacks.ModelCheckpoint("modelrfidX1.h5", save_best_only=True)

model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
history = model.fit(X_train, t_train, epochs=50000, 
                    validation_data=(X_test, t_test), 
                    callbacks=[checkpoint_cb, early_stopping_cb])

print(history.params)

The history values it shows are like these:

Epoch 164/50000
26320/26320 [==============================] - 1s 44us/step - loss: 0.2543 - accuracy: 0.8786 - val_loss: 0.2692 - val_accuracy: 0.8669
Epoch 165/50000
26320/26320 [==============================] - 1s 39us/step - loss: 0.2541 - accuracy: 0.8790 - val_loss: 0.2621 - val_accuracy: 0.8705
Epoch 166/50000
26320/26320 [==============================] - 1s 39us/step - loss: 0.2548 - accuracy: 0.8782 - val_loss: 0.2658 - val_accuracy: 0.8701
Epoch 167/50000
26320/26320 [==============================] - 1s 39us/step - loss: 0.2541 - accuracy: 0.8782 - val_loss: 0.2686 - val_accuracy: 0.8673
Epoch 168/50000
26320/26320 [==============================] - 1s 40us/step - loss: 0.2534 - accuracy: 0.8780 - val_loss: 0.2651 - val_accuracy: 0.8684
Epoch 169/50000
26320/26320 [==============================] - 1s 39us/step - loss: 0.2552 - accuracy: 0.8778 - val_loss: 0.2645 - val_accuracy: 0.8689
Epoch 170/50000
26320/26320 [==============================] - 1s 40us/step - loss: 0.2554 - accuracy: 0.8766 - val_loss: 0.2620 - val_accuracy: 0.8711
Epoch 171/50000
26320/26320 [==============================] - 1s 40us/step - loss: 0.2538 - accuracy: 0.8779 - val_loss: 0.2777 - val_accuracy: 0.8611

I am representing the loss and val_loss numbers with respect Epoch but I do not know how exactly it should be labeled the y-axis for these losses and in what units they are. I assume that loss measures binary-crossentropy the y-axis label should be H and the units [bits] but I would like to be sure. I have been searching in the keras documentation and in research papers but still I do not know how to label the y-axis.

score 1 · Answer 1 · answered Jun 11 '20 at 15:06

The loss functions that are often used to train neural nets are often surrogate loss functions, meaning that they aren't the actual metric you're trying to optimize. Surrogate loss functions are used because the metric you're trying to optimize is often non-differentiable. For example, binary cross-entropy is a surrogate loss function for classification error, which doesn't smoothly change as the weights of the neural network do.

The loss function does not have meaningful units and it's value does not mean much. You can't really compare the value of a loss function on one problem to another problem. A graph of the loss function is really only useful to show the learning trend (this is why the graph is called the learning curve). Since they only show the trend, the units are often not presented.

If you want to generate a graph with units, you should produce a graph of the main metric you are trying to optimize. For the binary cross-entropy example, that could be classification accuracy or error (which are percentages).

How to label the loss values in Keras binary-crossentropy model

1 Answers1