1

I have a very interesting problem. I use xlm-roberta model for multilabel text classification and I use 1s and 0s for the labels. I get 5 months text data from database, do train validation test split and get %86 accuracy on the test data. Everything is good until that point. Then I provide some extra test set from the 6th month with exact same text preprocessing but it gives me very bad predictions (almost random choice). I cannot understand how that is possible. Do you have any idea? Thanks in advance for your help?

İhsan Dağ
  • 29
  • 1
  • 5

1 Answers1

0

Try checking if you have used the same tokkenizer to train the model and to perform inference.

chancar
  • 3
  • 2