1

So I have this pipeline i used for a text classifier that works fine.

from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression

from sklearn.pipeline import Pipeline
lr = Pipeline([('vect', CountVectorizer()),
           ('tfidf', TfidfTransformer()),
           ('clf', LogisticRegression(max_iter = 1000)),
          ])

lr.fit(X_train,y_train)
y_pred1 = lr.predict(X_test)

The thing is, when i try to use the variable name 'vect' in predicting the text, i am told 'vect' is not defined.

news = ["A phase two clinical trial found the shot combined with immunotherapy drug Merck slashed the risk of melanoma returning by 44 percent compared to using the drug alone. Preliminary findings were published in December but had not been reviewed and confirmed by other scientists."]

  x_new_counts = vect.transform(news)
  x_new_tf = tfidf.transform(x_new_counts)

  predicted = clf.predict(x_new_tf)

  for doc, category in zip(news, predicted):
       print(category) 

  #error message: name 'vect' is not defined

How is that possible when vect is defined in the pipeline?

Barri
  • 44
  • 4

1 Answers1

1

You are trying to directly access the 'vect' variable but it is not defined outside of the pipeline, use the pipeline object lr to perform the transformation.

news = ["A phase two clinical trial found the shot combined with immunotherapy drug Merck slashed the risk of melanoma returning by 44 percent compared to using the drug alone. Preliminary findings were published in December but had not been reviewed and confirmed by other scientists."]

predicted = lr.predict(news)

for doc, category in zip(news, predicted):
    print(category)
Saxtheowl
  • 4,136
  • 5
  • 23
  • 32