0

I was just trying to run a fake news detection program . This is my code ( only error part )

#DataFlair - Initialize a TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)
#DataFlair - Fit and transform train set, transform test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train) 
tfidf_test=tfidf_vectorizer.transform(x_test)`

and getting error as

ValueError                                Traceback (most recent call last)

<ipython-input-19-bd6e732b0b7b> in <module>()
      3 
      4 #DataFlair - Fit and transform train set, transform test set
----> 5 tfidf_train=tfidf_vectorizer.fit_transform(x_train)
      6 tfidf_test=tfidf_vectorizer.transform(x_test)

4 frames

/usr/local/lib/python3.7/dist-packages/sklearn/feature_extraction/text.py in decode(self, doc)
    225         if doc is np.nan:
    226             raise ValueError(
--> 227                 "np.nan is an invalid document, expected byte or unicode string."
    228             )
    229 


ValueError: np.nan is an invalid document, expected byte or unicode string.
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • 2
    Since the problem is the `x_train` data, you really should have shown us how that was created. – Tim Roberts Apr 29 '22 at 05:27
  • Does this answer your question? [TfidfVectorizer in scikit-learn : ValueError: np.nan is an invalid document](https://stackoverflow.com/questions/39303912/tfidfvectorizer-in-scikit-learn-valueerror-np-nan-is-an-invalid-document) – Evgeny Kovalev Apr 29 '22 at 08:05

0 Answers0