Train GPT2 with Trainer & TrainingArguments using/specifying attention_mask

Question

I'm using Trainer & TrainingArguments to train GPT2 Model, but it seems that this does not work well.

My datasets have the ids of the tokens of my corpus and the mask of each text, to indicate where to apply the attention:

Dataset({
features: ['attention_mask', 'input_ids', 'labels'],
num_rows: 2012860
}))

I am doing the training with Trainer & TrainingArguments, passing my model and my previous dataset as follows. But nowhere do I specify anything about the attention_mask:

training_args = TrainingArguments(
output_dir=path_save_checkpoints,
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size = 4,
gradient_accumulation_steps = 4,
logging_steps = 5_000, save_steps=5_000,
fp16=True,
deepspeed="ds_config.json",
remove_unused_columns = True,
debug = True
)

trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset,
tokenizer=tokenizer,
)

trainer.train()

How should I tell the Trainer to use this feature (attention_mask)? If you take a look at the file /transformers/trainer.py there is no reference to "attention" or "mask".

Thanks in advance!

score 0 · Answer 1 · answered Oct 13 '21 at 00:14

Somewhere in the source code, you will see that inputs are passed to the model something like this

outputs = model(**inputs)

As long as your collator returns a dictionary that includes the attention_mask key, your attention mask will be passed to your GPT2 model.

Train GPT2 with Trainer & TrainingArguments using/specifying attention_mask

1 Answers1