I have some questions about fine-tuning causal language model using transformers and PyTorch.
My main goal is to fine-tune XLNet. However, I found the most of posts online was targeting at text classification, like this post. I was wondering, is there any way to fine-tune the model, without using the run_language_model.py
from transformers' GitHub?
Here is a piece of my code trying to fine-tune XLNet:
model = XLNetLMHeadModel.from_pretrained("xlnet-base-cased")
tokenizer = XLNetTokenizer.from_pretrained("xlnet-base-cased", do_lower_case=True)
LOSS = torch.nn.CrossEntrypoLoss()
batch_texts = ["this is sentence 1", "i have another sentence like this", "the final sentence"]
encodings = tokenizer.encode_plus(batch_texts, add_special_tokens=True,
return_tensors=True, return_attention_mask=True)
outputs = model(encodings["input_ids"], encodings["attention_mask"])
loss = LOSS(outputs[0], target_ids)
loss.backward()
# ignoring the rest of codes...
I got stuck at the last two lines. At first, when using this LM model, it seems I don't have any labels
as the supervised learning usually do; Second, as the language model which is to minimize the loss (cross-entropy here), I need a target_ids
to compute the loss and perplexity with input_ids
.
Here are my follow-up questions:
- How should I deal with this
labels
during the model fitting? - Should I set something like
target_ids=encodings["input_ids"].copy()
to compute cross-entropy loss and perplexity? - If not, how should set this
target_ids
? - From the perplexity page from transformers' documentation, how should I adapt its method for non-fixed length of input text?
- I saw another post from the documentation saying that it requires padding text for causal language modeling. However, from the link in 3), there is no such sign for padding text. Which one should I follow?
Any suggestions and advice will be appreciated!