I have a very interesting problem. I use xlm-roberta model for multilabel text classification and I use 1s and 0s for the labels. I get 5 months text data from database, do train validation test split and get %86 accuracy on the test data. Everything is good until that point. Then I provide some extra test set from the 6th month with exact same text preprocessing but it gives me very bad predictions (almost random choice). I cannot understand how that is possible. Do you have any idea? Thanks in advance for your help?
Asked
Active
Viewed 240 times
1 Answers
0
Try checking if you have used the same tokkenizer to train the model and to perform inference.

chancar
- 3
- 2