Predictive model performs exceedingly well during training and testing, but predicts zero when predicting the very same data

Question

I've created a binary classification model which predicts whether an article is part of the positive or negative class. I am using TF-IDF fed into an XGBoost classifier alongside another feature. I get an AUC score of very close to 1 when both training/testing and crossvalidating. I got a .5 score when testing on my holdout data. This seemed odd to me, so I fed the very same training data into my model, and even that returns a .5 AUC score. The code below takes in a dataframe, fits and transforms to the tf-idf vectors and formats it all into a dMatrix.

def format_to_dmatrix(known_targets):
  y = known_targets['target']
  X = known_targets[['body', 'day_of_year']]
  X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.1, random_state=42)

  tfidf.fit(X_train['body'])
  pickle.dump(tfidf.vocabulary_,open("tfidf_features.pkl","wb"))
  X_train_enc = tfidf.transform(X_train['body']).toarray()
  X_test_enc = tfidf.transform(X_test['body']).toarray()

  new_cols = tfidf.get_feature_names()
  new_cols.append('day_of_year')

  a = np.array(X_train['day_of_year'])
  a = a.reshape(a.shape[0], 1)
  b = np.array(X_test['day_of_year'])
  b = b.reshape(b.shape[0], 1)

  X_train = np.append(X_train_enc, a, axis=1)
  X_test = np.append(X_test_enc, b, axis=1)

  dtrain = xgb.DMatrix(X_train, label=y_train.values, feature_names=new_cols)
  dtest = xgb.DMatrix(X_test, label=y_test.values, feature_names=new_cols)
  return dtrain, dtest, tfidf

I cross validate and find a test-auc-mean of .9979, so I save the model as shown below.

best_model = xgb.train(
params,
dtrain,
num_boost_round=num_boost_round,
evals=[(dtest, "Test")]

This is my code to load in new data:

def test_newdata(data):
tf1 = pickle.load(open("tfidf_features.pkl", 'rb'))
tf1_new = TfidfVectorizer(max_features=1500, lowercase=True, analyzer='word', stop_words='english', ngram_range=(1, 1), vocabulary = tf1.keys())
encoded_body = tf1_new.fit_transform(data['body']).toarray()
new_cols = tf1_new.get_feature_names()
new_cols.append('day_of_year')
day_of_year = np.array(data['day_of_year'])
day_of_year = day_of_year.reshape(day_of_year.shape[0], 1)
formatted_test_data = np.append(encoded_body, day_of_year, axis=1)
df= pd.DataFrame(formatted_test_data, columns=new_cols)
return xgb.DMatrix(df)

And this code below shows that my AUC score is .5 despite loading in the very same data. Is there an error i've missed somewhere?

loaded_model = xgb.Booster()
loaded_model.load_model("earn_modelv3.model")

holdout = known_targets
formatted_test_data = test_newdata(holdout)

holdout_preds = loaded_model.predict(formatted_test_data)

predictions_binary = np.where(holdout_preds > .5, 1, 0)
{round(roc_auc_score(holdout['target'], predictions_binary) ,4)

Predictive model performs exceedingly well during training and testing, but predicts zero when predicting the very same data

0 Answers0