BERT high test accuracy but bad predictions on new data

Question

I have a very interesting problem. I use xlm-roberta model for multilabel text classification and I use 1s and 0s for the labels. I get 5 months text data from database, do train validation test split and get %86 accuracy on the test data. Everything is good until that point. Then I provide some extra test set from the 6th month with exact same text preprocessing but it gives me very bad predictions (almost random choice). I cannot understand how that is possible. Do you have any idea? Thanks in advance for your help?

score 0 · Answer 1 · answered Aug 31 '22 at 22:56

0

Try checking if you have used the same tokkenizer to train the model and to perform inference.

answered Aug 31 '22 at 22:56

chancar

3
2

BERT high test accuracy but bad predictions on new data

1 Answers1