Right now I am trying to train/finetune a pretrained RoBERTa model with a multichoice head, but I am having difficulty finding the right input so my model is able to train/finetune.
The dataframe I have right now looks like this:
With the 3 options being tokenized sentences, using:
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
for i in range(0, len(train_data)):
train_data["OptionA"][i] = tokenizer.encode(train_data["OptionA"][i])
train_data["OptionB"][i] = tokenizer.encode(train_data["OptionB"][i])
train_data["OptionC"][i] = tokenizer.encode(train_data["OptionC"][i])
My evaluation set looks like this as well, with the test set having 6500 rows and the evaluation set having 1500 rows. I am trying to implement this with:
from transformers import RobertaForMultipleChoice, Trainer, TrainingArguments
model = RobertaForMultipleChoice.from_pretrained('roberta-base')
training_args = TrainingArguments(
output_dir='./results', # output directory
num_train_epochs=1, # total # of training epochs
per_device_train_batch_size=32, # batch size per device during training
per_device_eval_batch_size=32, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
)
trainer = Trainer(
model=model, # the instantiated Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset = train_split, # training dataset
eval_dataset = eval_split # evaluation dataset
)
trainer.train()
But I keep getting different keyerrors, an example:
KeyError: 2526
If anyone knows what I am doing wrong, I would be very grateful as I am stuck trying to train this model for the past 3 days.