The number of features of the model must match the input. Model n_features is 7985 and input n_features is 1

Question

I built a spam classifier with random forest and wanted to make a separate function that can classify a text message to be spam or ham and I tried:

def predict_message(pred_text):
    pred_text=[pred_text]
    pred_text2 = tfidf_vect.fit_transform(pred_text)
    pred_features = pd.DataFrame(pred_text2.toarray())
    prediction = rf_model.predict(pred_features)
    return (prediction)

pred_text = "how are you doing today?"

prediction = predict_message(pred_text)
print(prediction)

but it gives me the error:

The number of features of the model must match the input.
Model n_features is 7985 and input n_features is 1

I can't see the problem, how can I make it work?

Why did you re-ask this same question on the [meta] meta site? You've already got an answer here, one that you apparently have completely and rudely ignored before re-asking your question. Not good. — Hovercraft Full Of Eels, May 16 '21 at 14:54
I have just started using the website and in my process of understanding. The unnecessary criticism of yours is rude, not my move. @HovercraftFullOfEels — Işıl Berfin Koparan, May 16 '21 at 18:41
Please read the [help] link and go through the [tour] to learn how to best use this site. I say this for your own benefit since if more of your questions on this site are poorly received, you could be banned from asking, something that you'll want to avoid. — Hovercraft Full Of Eels, May 16 '21 at 18:52

score 0 · Accepted Answer · answered May 14 '21 at 14:10

By calling tfidf_vect.fit_transform(pred_text) your vectorizer loses any information it had from your original training corpus.

You should just call transform.

These changes below should help:

def predict_message(pred_text):
    pred_text=[pred_text]
    pred_text2 = tfidf_vect.transform(pred_text)  # Changed
    prediction = rf_model.predict(pred_text2)
    return (prediction)

pred_text = "how are you doing today?"

prediction = predict_message(pred_text)
print(prediction)

@IşılBerfinKoparan: good, you finally acknowledged the answer — Hovercraft Full Of Eels, May 16 '21 at 19:10

The number of features of the model must match the input. Model n_features is 7985 and input n_features is 1

1 Answers1