What is the reason that makes loss function and metric have different results although the same function is used in both during training?

Question

I am building a deep learning model as such:

tf.keras.backend.clear_session()
model = tf.keras.models.Sequential([
  tf.keras.layers.InputLayer(input_shape=(None,30), dtype=tf.float64, ragged=True),
  tf.keras.layers.SimpleRNN(20, return_sequences=True),
  tf.keras.layers.SimpleRNN(20),
  tf.keras.layers.Dense(2,activation='relu'),  
  tf.keras.layers.Lambda(lambda x: x * 100.0)   
])
model.summary()

optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mae',
              optimizer=optimizer,
              metrics=['mae'])
history = model.fit(train_examples,train_labels,batch_size=100, 
                     epochs=30,validation_data=(test_examples, test_labels) )

Model output is:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn (SimpleRNN)       (None, None, 20)          840       
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 20)                820       
_________________________________________________________________
dense (Dense)                (None, 2)                 42        
_________________________________________________________________
lambda (Lambda)              (None, 2)                 0         
=================================================================
Total params: 1,702
Trainable params: 1,702
Non-trainable params: 0
_________________________________________________________________
Train on 120000 samples, validate on 25741 samples
Epoch 1/10
1102849/120000 [===================================================================================================================================================================================================================================================================================] - 16s 15us/sample - loss: 1628.9943 - mae: 1809.3457 - val_loss: 3446.0632 - val_mae: 456.3423
Epoch 2/10
1102849/120000 [===================================================================================================================================================================================================================================================================================] - 14s 12us/sample - loss: 718.8068 - mae: 793.9257 - val_loss: 1901.1981 - val_mae: 279.6073
Epoch 3/10
1102849/120000 [===================================================================================================================================================================================================================================================================================] - 14s 12us/sample - loss: 422.3028 - mae: 447.7400 - val_loss: 1254.6451 - val_mae: 208.5223
Epoch 4/10
1102849/120000 [===================================================================================================================================================================================================================================================================================] - 14s 12us/sample - loss: 286.6912 - mae: 303.5997 - val_loss: 981.5281 - val_mae: 183.8901
Epoch 5/10
1102849/120000 [===================================================================================================================================================================================================================================================================================] - 14s 12us/sample - loss: 251.7744 - mae: 241.2676 - val_loss: 875.2967 - val_mae: 166.8759
Epoch 6/10
1102849/120000 [===================================================================================================================================================================================================================================================================================] - 14s 12us/sample - loss: 191.8624 - mae: 210.4237 - val_loss: 763.8144 - val_mae: 154.2717
Epoch 7/10
1102849/120000 [===================================================================================================================================================================================================================================================================================] - 14s 12us/sample - loss: 182.3771 - mae: 196.3229 - val_loss: 732.1604 - val_mae: 148.1582
Epoch 8/10
1102849/120000 [===================================================================================================================================================================================================================================================================================] - 14s 12us/sample - loss: 211.0438 - mae: 186.6520 - val_loss: 721.2206 - val_mae: 151.4263
Epoch 9/10
1102849/120000 [===================================================================================================================================================================================================================================================================================] - 14s 12us/sample - loss: 165.2457 - mae: 181.0769 - val_loss: 742.0166 - val_mae: 160.5220
Epoch 10/10
1102849/120000 [===================================================================================================================================================================================================================================================================================] - 14s 12us/sample - loss: 165.6894 - mae: 176.1805 - val_loss: 688.5046 - val_mae: 147.5439

During training I get different values for loss and metrics although both utilize 'mae' function.Especially 'val_loss' is extremely higher than the 'val_mae'. I am using tf 2.1.0. Any idea why?

Thanks!

The Mean Absolute Error included in the loss os obviously used for the optimization of the NN. After a batch of samples passes through the network, the loss is calculated and the weights are updated using gradient descent. Now, when the 2nd batch goes in the network, the loss will be different as optimization occurred. Whereas, the MAE included in the metrics calculates the error for all batches present in the epoch. This leads to a difference in both the MAE values. — Shubham Panchal, Apr 11 '20 at 06:46
Thank you for your reply! But I still have questions, you are saying that loss is calculated per batch and mae is calculated once all the batches pass.What I understand is that : The loss value I see at the end of each epoch is like a mean of losses obtained from the batches in each epoch whereas metric 'mae' is calculated at the end of the epoch with the final weights. If I am correct this might explain the difference. @Shubham Panchal — Arwen, Apr 11 '20 at 18:54

What is the reason that makes loss function and metric have different results although the same function is used in both during training?

0 Answers0