What is the highest loss amount with keras model training?

Question

Im training my model with Keras and I'm trying to read the evaluation statistics. I know what the loss function is for but what is that highest value possible? The closer to zero the better but I dont know if 0.2 is good. I can see the loss is going down after more iterations and the accuracy is increasing too.

My code for training the model:

def trainModel(bow,unitlabels,units):
    x_train = np.array(bow)
    print("X_train: ", x_train)
    y_train = np.array(unitlabels)
    print("Y_train: ", y_train)
    model = tf.keras.models.Sequential([
            tf.keras.layers.Dense(256, activation=tf.nn.relu),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(len(units), activation=tf.nn.softmax)])
    model.compile(optimizer='adam',
                         loss='sparse_categorical_crossentropy',
                         metrics=['accuracy'])
    model.fit(x_train, y_train, epochs=50)
    return model

and my results:

Epoch 1/50
1249/1249 [==============================] - 0s 361us/sample - loss: 0.8800 - acc: 0.7590
Epoch 2/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.4689 - acc: 0.8519
Epoch 3/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.3766 - acc: 0.8687
Epoch 4/50
1249/1249 [==============================] - 0s 92us/sample - loss: 0.3339 - acc: 0.8663
Epoch 5/50
1249/1249 [==============================] - 0s 89us/sample - loss: 0.3057 - acc: 0.8719
Epoch 6/50
1249/1249 [==============================] - 0s 87us/sample - loss: 0.2877 - acc: 0.8799
Epoch 7/50
1249/1249 [==============================] - 0s 88us/sample - loss: 0.2752 - acc: 0.8815
Epoch 8/50
1249/1249 [==============================] - 0s 89us/sample - loss: 0.2650 - acc: 0.8783
Epoch 9/50
1249/1249 [==============================] - 0s 92us/sample - loss: 0.2562 - acc: 0.8847
Epoch 10/50
1249/1249 [==============================] - 0s 91us/sample - loss: 0.2537 - acc: 0.8799
Epoch 11/50
1249/1249 [==============================] - 0s 89us/sample - loss: 0.2468 - acc: 0.8903
Epoch 12/50
1249/1249 [==============================] - 0s 88us/sample - loss: 0.2436 - acc: 0.8927
Epoch 13/50
1249/1249 [==============================] - 0s 89us/sample - loss: 0.2420 - acc: 0.8935
Epoch 14/50
1249/1249 [==============================] - 0s 88us/sample - loss: 0.2366 - acc: 0.8935
Epoch 15/50
1249/1249 [==============================] - 0s 94us/sample - loss: 0.2305 - acc: 0.8951
Epoch 16/50
1249/1249 [==============================] - 0s 98us/sample - loss: 0.2265 - acc: 0.8991
Epoch 17/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.2280 - acc: 0.8967
Epoch 18/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.2247 - acc: 0.8951
Epoch 19/50
1249/1249 [==============================] - 0s 92us/sample - loss: 0.2237 - acc: 0.8975
Epoch 20/50
1249/1249 [==============================] - 0s 102us/sample - loss: 0.2196 - acc: 0.8991
Epoch 21/50
1249/1249 [==============================] - 0s 102us/sample - loss: 0.2223 - acc: 0.8983
Epoch 22/50
1249/1249 [==============================] - 0s 102us/sample - loss: 0.2163 - acc: 0.8943
Epoch 23/50
1249/1249 [==============================] - 0s 100us/sample - loss: 0.2177 - acc: 0.8983
Epoch 24/50
1249/1249 [==============================] - 0s 101us/sample - loss: 0.2165 - acc: 0.8983
Epoch 25/50
1249/1249 [==============================] - 0s 100us/sample - loss: 0.2148 - acc: 0.9007
Epoch 26/50
1249/1249 [==============================] - 0s 98us/sample - loss: 0.2189 - acc: 0.8903
Epoch 27/50
1249/1249 [==============================] - 0s 98us/sample - loss: 0.2099 - acc: 0.9023
Epoch 28/50
1249/1249 [==============================] - 0s 98us/sample - loss: 0.2102 - acc: 0.9023
Epoch 29/50
1249/1249 [==============================] - 0s 94us/sample - loss: 0.2091 - acc: 0.8975
Epoch 30/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.2064 - acc: 0.9015
Epoch 31/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.2044 - acc: 0.9023
Epoch 32/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.2070 - acc: 0.9031
Epoch 33/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.2045 - acc: 0.9039
Epoch 34/50
1249/1249 [==============================] - 0s 94us/sample - loss: 0.2007 - acc: 0.9063
Epoch 35/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.1999 - acc: 0.9055
Epoch 36/50
1249/1249 [==============================] - 0s 103us/sample - loss: 0.2010 - acc: 0.9039
Epoch 37/50
1249/1249 [==============================] - 0s 111us/sample - loss: 0.2053 - acc: 0.9031
Epoch 38/50
1249/1249 [==============================] - 0s 99us/sample - loss: 0.2018 - acc: 0.9039
Epoch 39/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.2023 - acc: 0.9055
Epoch 40/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.2019 - acc: 0.9015
Epoch 41/50
1249/1249 [==============================] - 0s 92us/sample - loss: 0.2040 - acc: 0.8983
Epoch 42/50
1249/1249 [==============================] - 0s 103us/sample - loss: 0.2033 - acc: 0.8943
Epoch 43/50
1249/1249 [==============================] - 0s 97us/sample - loss: 0.2024 - acc: 0.9039
Epoch 44/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.2047 - acc: 0.9079
Epoch 45/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.1996 - acc: 0.9039
Epoch 46/50
1249/1249 [==============================] - 0s 91us/sample - loss: 0.1979 - acc: 0.9079
Epoch 47/50
1249/1249 [==============================] - 0s 90us/sample - loss: 0.1960 - acc: 0.9087
Epoch 48/50
1249/1249 [==============================] - 0s 97us/sample - loss: 0.1969 - acc: 0.9055
Epoch 49/50
1249/1249 [==============================] - 0s 99us/sample - loss: 0.1950 - acc: 0.9087
Epoch 50/50
1249/1249 [==============================] - 0s 98us/sample - loss: 0.1956 - acc: 0.9071

Losses are usually defined between 0 and 1. There is no straight answer about whether 0.2 is good or not. For that you need to look at other metrics to understand what your model is capable of i.e, precision-recall, auc, confusion matrices etc — Karl, May 09 '19 at 09:20
Also, you should really include validation data in your training run as well. As long as both training and validation loss continues to go down you can assume that training for longer will make the model better. There is usually a point of divergence somewhere though, where training loss keeps going down but validation loss starts going up again. This is where you start overfitting on the training data — Karl, May 09 '19 at 09:22
Can I do this by adding something like validation_split=0.2 to my model.fit() function? — thomas dees, May 09 '19 at 09:43
Yes that would work and is probably the fastest way to achieve this. In the longer term I would however recommend doing this split yourself and then explicitly passing your validation data to `fit` with `validation_data=...`. Doing the split yourself is a good idea since you can ensure consistency between training runs, even if (for example) your training data changes — Karl, May 09 '19 at 09:59

score 1 · Accepted Answer · answered May 09 '19 at 09:59

The maximum loss for crossentropy loss happens when you have a uniform distribution over your classes, there is no inclination towards any class and you get maximum entropy. Looking at the formula:

you can compute the maximum loss, often natural log ln is used for log. Since you would have a 1 hot target, the sum will reduce to -log(a^(i)_k) and with uniform assumption a^(i) = 1/len(units). For example, in binary classification, set a=0.5, and -ln(0.5) ~ 0.693147 so the maximum loss would be around 0.69.

What is the highest loss amount with keras model training?

1 Answers1