Below is the code to configure TrainingArguments consumed from the HuggingFace transformers library to finetune the GPT2 language model.
training_args = TrainingArguments(
output_dir="./gpt2-language-model", #The output directory
num_train_epochs=100, # number of training epochs
per_device_train_batch_size=8, # batch size for training #32, 10
per_device_eval_batch_size=8, # batch size for evaluation #64, 10
save_steps=100, # after # steps model is saved
warmup_steps=500,# number of warmup steps for learning rate scheduler
prediction_loss_only=True,
metric_for_best_model = "eval_loss",
load_best_model_at_end = True,
evaluation_strategy="epoch",
learning_rate=0.00004, # learning rate
)
early_stop_callback = EarlyStoppingCallback(early_stopping_patience = 3)
trainer = Trainer(
model=gpt2_model,
args=training_args,
data_collator=data_collator,
train_dataset=train_dataset,
eval_dataset=test_dataset,
callbacks = [early_stop_callback],
)
The number of epochs as 100 and learning_rate as 0.00004 and also the early_stopping is configured with the patience value as 3.
The model ran for 5/100 epochs and noticed that the difference in loss_value is negligible. The latest checkpoint is saved as checkpoint-latest
.
Now Can I modify the learning_rate
may be to 0.01
from 0.00004
and resume the training from the latest saved checkpoint - checkpoint-latest
? Doing that will be efficient?
Or to train with the new learning_rate
value should I start the training from the beginning?