I'm trying to find an optimal learning rate using python pl.tuner.Tuner
but results aren't as expected
The model I am running is a linear classifier on top of a BertForSequenceClassification Automodel
I want to find the optimum learning rate when the bert model is frozen.
To do this I am running this code:
tuner = pl.tuner.Tuner(trainer)
results = tuner.lr_find(
model,
# optimizer = optimizer,
train_dataloaders=data_module,
min_lr=10e-8,
max_lr=10.0,
)
# Plot with
fig = results.plot(suggest=True)
fig.show()
My optimizer is configured like this in the model:
def configure_optimizers(self):
"""
:return:
"""
optimizer = torch.optim.AdamW(self.parameters(), lr=self.learning_rate)
scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=self.n_warmup_steps,
num_training_steps=self.n_training_steps,
)
return dict(optimizer=optimizer, lr_scheduler=dict(scheduler=scheduler, interval="step"))
This produces:
Chart of loss against learning rate
I am confused as to why the loss is increasing at lower learning rates, and this is not what I was expecting.
I have tried:
- removing the scheduler
- freezing/ unfreezing the weights
- Changing the initial learning rate
I was expecting a chart like this: https://github.com/comhar/pytorch-learning-rate-tuner/blob/master/images/learning_rate_tuner_plot.png
Any help appreciated
Many thanks