When adding an ExponentialDecay
learning rate schedule to my Adam
optimizer, it changed the training behavior even before it should become effective. I used the following definition for the schedule:
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
1e-3, decay_steps=25, decay_rate=0.95, staircase=True)
Since I'm using staircase=True
, there should be no difference for the first 25 epochs compared to using a static learning rate of the same value. So the following two optimizers should yield identical training results for the first 25 epochs:
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
However I observed that the behavior is different already before:
This is the test code I used:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout
np.random.seed(0)
x_data = 2*np.random.random(size=(1000, 1))
y_data = np.random.normal(loc=x_data**2, scale=0.05)
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
1e-3, decay_steps=25, decay_rate=0.95, staircase=True)
histories = []
learning_rates = [1e-3, lr_schedule]
for lr in learning_rates:
tf.random.set_seed(0)
model = tf.keras.models.Sequential([
Dense(10, activation='tanh', input_dim=1), Dropout(0.2),
Dense(10, activation='tanh'), Dropout(0.2),
Dense(1)
])
optimizer = tf.keras.optimizers.Adam(learning_rate=lr)
model.compile(optimizer=optimizer, loss='mse')
history = model.fit(x_data, y_data, epochs=50)
histories.append(history.history['loss'])
fig, ax = plt.subplots()
ax.set(xlabel='Epoch', ylabel='Loss')
ax.plot(histories[0], label='Static learning rate')
ax.plot(histories[1], label='Learning rate schedule')
ax.legend()
plt.show()
I'm using Python 3.7.9 and the following install of Tensorflow:
$ conda list | grep tensorflow
tensorflow 2.1.0 mkl_py37h80a91df_0
tensorflow-base 2.1.0 mkl_py37h6d63fb7_0
tensorflow-estimator 2.1.0 pyhd54b08b_0