HuggingFace Transformers model config reported "This is a deprecated strategy to control generation and will be removed soon"

Question

I am training a sequence-to-sequence model using HuggingFace Transformers' Seq2SeqTrainer. When I execute the training process, it reports the following warning:

/path/to/python3.9/site-packages/transformers/generation/utils.py:1219: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)

Note the HuggingFace documentation link is dead.

I use the following codes:

model = BartForConditionalGeneration.from_pretrained(checkpoint)
model.config.output_attentions = True
model.config.output_hidden_states = True

training_args = Seq2SeqTrainingArguments(
    output_dir = "output_dir_here",
    evaluation_strategy = IntervalStrategy.STEPS, #"epoch",
    optim = "adamw_torch", # Use new PyTorch optimizer
    eval_steps = 1000, # New
    logging_steps = 1000,
    save_steps = 1000,
    learning_rate = 2e-5,
    per_device_train_batch_size = batch_size,
    per_device_eval_batch_size = batch_size,
    weight_decay = 0.01,
    save_total_limit = 3,
    num_train_epochs = 30,
    predict_with_generate=True,
    remove_unused_columns=True,
    fp16 = True,
    push_to_hub = True,
    metric_for_best_model = 'bleu', # New or "f1"
    load_best_model_at_end = True # New
)

trainer = Seq2SeqTrainer(
    model = model,
    args = training_args,
    train_dataset = train_ds,
    eval_dataset = eval_ds,
    tokenizer = tokenizer,
    data_collator = data_collator,
    compute_metrics = compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience=3)]
)

trainer.train()

The training process can be completed without any problem, but I am concerned about the deprecation warning. How should I modify the codes to solve the problem?

Version:

Transformers 4.28.1
Python 3.9.7

Please show all the parameters you've used inside `Seq2SeqTrainingArguments` and `Seq2SeqTrainer`. Without that it's hard to pinpoint which arguments you used in those classes that's raising the deprecation warning. — alvas, Jun 14 '23 at 02:52
Added. I thought it's related to the `model.config.xxx` lines. — Raptor, Jun 14 '23 at 04:18
Thx, I believe I managed to understand the issue please try what I shared in my answer. — Maciej Skorski, Jun 20 '23 at 14:29
related: https://stackoverflow.com/questions/76633368/why-does-the-falcon-qlora-tutorial-code-use-eos-token-as-pad-token — Charlie Parker, Jul 12 '23 at 17:29

Maciej Skorski · Accepted Answer · 2023-06-28T04:18:01.913

Root-Cause

This is a warning about using the API in the outdated manner (=unsupported soon). However, as of now, the code is fixing this on its own - hence only a warning not a breaking error.

See these lines in the source code.

Remedy

The transformers library encourages the use of config files. In this case, we need to pass a GenerationConfig object early, rather than to set attributes.

I will first share a clean, simple example:

from transformers import AutoTokenizer, BartForConditionalGeneration

model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")

ARTICLE_TO_SUMMARIZE = (
    "PG&E stated it scheduled the blackouts in response to forecasts for high winds "
    "amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were "
    "scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."
)
inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors="pt")

# change config and generate summary

from transformers.generation import GenerationConfig

model.config.max_new_tokens = 10
model.config.min_length = 1
gen_cfg = GenerationConfig.from_model_config(model.config)
gen_cfg.max_new_tokens = 10
gen_cfg.min_length = 1

summary_ids = model.generate(inputs["input_ids"], generation_config=gen_cfg)
tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

If you try to manipulate the config attributes directly and pass no config, you get a warning. If you pass a GenerationConfig, you are all good. This example is reproducible as a Colab notebook here.

Now, to the original question. Note that, in general, changing architecture configs of pretrained models is not recommended for incompatibility reasons. This is sometimes possible with extra effort. However, certain config changes are possible upon initialization:

model = BartForConditionalGeneration.from_pretrained(
     "facebook/bart-large-cnn", 
     attention_dropout=0.123
)

Here is the fully-working code, corrected for reproducibility and see also this notebook

from transformers import AutoTokenizer, BartForConditionalGeneration
from transformers.generation import GenerationConfig
from transformers import Trainer, TrainingArguments
from transformers.models.bart.modeling_bart import shift_tokens_right
from transformers import DataCollatorForSeq2Seq

model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn", attention_dropout=0.123)
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
seq2seq_data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

def get_features(batch):
    input_encodings = tokenizer(batch["text"], max_length=1024, truncation=True)
    
    with tokenizer.as_target_tokenizer():
        target_encodings = tokenizer(batch["summary"], max_length=256, truncation=True)
        
    return {"input_ids": input_encodings["input_ids"], 
           "attention_mask": input_encodings["attention_mask"], 
           "labels": target_encodings["input_ids"]}

dataset_ftrs = dataset.map(get_features, batched=True)
columns = ['input_ids', 'labels', 'input_ids','attention_mask',] 
dataset_ftrs.set_format(type='torch', columns=columns)

training_args = TrainingArguments(
    output_dir='./models/bart-summarizer',          
    num_train_epochs=1,           
    per_device_train_batch_size=1, 
    per_device_eval_batch_size=1,   
    warmup_steps=500,               
    weight_decay=0.01,              
    logging_dir='./logs',          
)

model.config.output_attentions = True
model.config.output_hidden_states = True

training_args = TrainingArguments(
    output_dir='./models/bart-summarizer', 
    num_train_epochs=1, 
    warmup_steps=500,                                  
    per_device_train_batch_size=1, 
    per_device_eval_batch_size=1, 
    weight_decay=0.01, 
    logging_steps=10, 
    push_to_hub=False, 
    evaluation_strategy='steps', 
    eval_steps=500, 
    save_steps=1e6, 
    gradient_accumulation_steps=16,
)

trainer = Trainer(
    model=model, 
    args=training_args, 
    tokenizer=tokenizer,                  
    data_collator=seq2seq_data_collator,                  
    train_dataset=dataset_ftrs["train"],                  
    eval_dataset=dataset_ftrs["test"],
)

assert model.config.attention_dropout==0.123

#trainer.train()

@Raptor, right I stated the right diagnosis but followed with a wrong recipe. You need to pass a config object, see the updated answer. — Maciej Skorski, Jun 21 '23 at 12:26
Please refer the my question, where can I fit in the `gen_cfg` in my code? In `trainer.train()`? — Raptor, Jun 27 '23 at 09:49
@Raport, I believe I root-caused and explained the core issue (as other upvotes demonstrated) as the code was incomplete. See the updated answer, where I added more on your specific case. — Maciej Skorski, Jun 27 '23 at 21:26

HuggingFace Transformers model config reported "This is a deprecated strategy to control generation and will be removed soon"

1 Answers1

Linked