I'm trying to extract features from a text document. Here is my code:
import sklearn
from sklearn.datasets import load_files
from sklearn.feature_extraction.text import CountVectorizer
files = sklearn.datasets.load_files('/home/niyas/Documents/project/container', shuffle = False)
vectorizer = CountVectorizer(min_df=1)
X = vectorizer.fit_transform(files.data[1])
Y=vectorizer.get_feature_names()
I'm getting an error "ValueError: empty vocabulary; perhaps the documents only contain stop words". The code works fine when I pass a string with the exact same content of the text doc.
Help me. Thanks in advance.