So I have this pipeline i used for a text classifier that works fine.
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
lr = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', LogisticRegression(max_iter = 1000)),
])
lr.fit(X_train,y_train)
y_pred1 = lr.predict(X_test)
The thing is, when i try to use the variable name 'vect' in predicting the text, i am told 'vect' is not defined.
news = ["A phase two clinical trial found the shot combined with immunotherapy drug Merck slashed the risk of melanoma returning by 44 percent compared to using the drug alone. Preliminary findings were published in December but had not been reviewed and confirmed by other scientists."]
x_new_counts = vect.transform(news)
x_new_tf = tfidf.transform(x_new_counts)
predicted = clf.predict(x_new_tf)
for doc, category in zip(news, predicted):
print(category)
#error message: name 'vect' is not defined
How is that possible when vect is defined in the pipeline?