0

I'm trying to find an optimal learning rate using python pl.tuner.Tuner but results aren't as expected

The model I am running is a linear classifier on top of a BertForSequenceClassification Automodel

I want to find the optimum learning rate when the bert model is frozen.

To do this I am running this code:

  
    tuner = pl.tuner.Tuner(trainer)
    results = tuner.lr_find(
        model, 
        # optimizer = optimizer,
        
        train_dataloaders=data_module, 
        min_lr=10e-8,
        max_lr=10.0,
    )
    # Plot with
    fig = results.plot(suggest=True)
    fig.show()

My optimizer is configured like this in the model:

   def configure_optimizers(self):
        """
        :return:
        """
        optimizer = torch.optim.AdamW(self.parameters(), lr=self.learning_rate)

        scheduler = get_linear_schedule_with_warmup(
            optimizer,
            num_warmup_steps=self.n_warmup_steps,
            num_training_steps=self.n_training_steps,
        )
        return dict(optimizer=optimizer, lr_scheduler=dict(scheduler=scheduler, interval="step"))

This produces:

Chart of loss against learning rate

I am confused as to why the loss is increasing at lower learning rates, and this is not what I was expecting.

I have tried:

  • removing the scheduler
  • freezing/ unfreezing the weights
  • Changing the initial learning rate

I was expecting a chart like this: https://github.com/comhar/pytorch-learning-rate-tuner/blob/master/images/learning_rate_tuner_plot.png

Any help appreciated

Many thanks

0 Answers0