I'm using Trainer & TrainingArguments to train GPT2 Model, but it seems that this does not work well.
My datasets have the ids of the tokens of my corpus and the mask of each text, to indicate where to apply the attention:
Dataset({
features: ['attention_mask', 'input_ids', 'labels'],
num_rows: 2012860
}))
I am doing the training with Trainer & TrainingArguments, passing my model and my previous dataset as follows. But nowhere do I specify anything about the attention_mask:
training_args = TrainingArguments(
output_dir=path_save_checkpoints,
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size = 4,
gradient_accumulation_steps = 4,
logging_steps = 5_000, save_steps=5_000,
fp16=True,
deepspeed="ds_config.json",
remove_unused_columns = True,
debug = True
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset,
tokenizer=tokenizer,
)
trainer.train()
How should I tell the Trainer to use this feature (attention_mask)? If you take a look at the file /transformers/trainer.py there is no reference to "attention" or "mask".
Thanks in advance!