-2

I built a spam classifier with random forest and wanted to make a separate function that can classify a text message to be spam or ham and I tried:

def predict_message(pred_text):
    pred_text=[pred_text]
    pred_text2 = tfidf_vect.fit_transform(pred_text)
    pred_features = pd.DataFrame(pred_text2.toarray())
    prediction = rf_model.predict(pred_features)
    return (prediction)

pred_text = "how are you doing today?"

prediction = predict_message(pred_text)
print(prediction)

but it gives me the error:

The number of features of the model must match the input.
Model n_features is 7985 and input n_features is 1 

I can't see the problem, how can I make it work?

Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
  • 2
    Why did you re-ask this same question on the [meta] meta site? You've already got an answer here, one that you apparently have completely and rudely ignored before re-asking your question. Not good. – Hovercraft Full Of Eels May 16 '21 at 14:54
  • I have just started using the website and in my process of understanding. The unnecessary criticism of yours is rude, not my move. @HovercraftFullOfEels – Işıl Berfin Koparan May 16 '21 at 18:41
  • 1
    Please read the [help] link and go through the [tour] to learn how to best use this site. I say this for your own benefit since if more of your questions on this site are poorly received, you could be banned from asking, something that you'll want to avoid. – Hovercraft Full Of Eels May 16 '21 at 18:52

1 Answers1

0

By calling tfidf_vect.fit_transform(pred_text) your vectorizer loses any information it had from your original training corpus.

You should just call transform.

These changes below should help:

def predict_message(pred_text):
    pred_text=[pred_text]
    pred_text2 = tfidf_vect.transform(pred_text)  # Changed
    prediction = rf_model.predict(pred_text2)
    return (prediction)

pred_text = "how are you doing today?"

prediction = predict_message(pred_text)
print(prediction)
Mike Xydas
  • 469
  • 5
  • 12