why does Huggingface's TextDatasetForNextSentencePrediction makes all the next sentence label same?

Asked Sep 22 '22 at 01:07

Active Sep 22 '22 at 15:46

Viewed 217 times

from transformers import TextDatasetForNextSentencePrediction
dataset = TextDatasetForNextSentencePrediction(
tokenizer=bert_cased_tokenizer,
file_path="/path/to/your/dataset",
block_size = 256
)

when I run this code and check all of the next sentence labels, either all next sentences are predicted to be 0 or all are predicted to be 1. According to the paper, the model should return 50% 0's and 50% 1's, but I am receiving 100% of predictions as either 1 or 0. I changed the nsp_probability, but it still shows same issue.

edited Sep 22 '22 at 15:46

asked Sep 22 '22 at 01:07

Ritesh Panditi

why does Huggingface's TextDatasetForNextSentencePrediction makes all the next sentence label same?

0 Answers0