Firstly, I fit it on the corpus of sms:
from sklearn.feature_extraction.text import CountVectorizer
clf = CountVectorizer()
X_desc = clf.fit_transform(X).toarray()
Seems to works fine:
X.shape = (5574,)
X_desc.shape = (5574, 8713)
But then I applied transform method to the textline, as we know, it should have (, 8713) shape as a result, but what we see:
str2 = 'Have you visited the last lecture on physics?'
print len(str2), clf.transform(str2).toarray().shape
52 (52, 8713)
What is going on here? One more thing - all numbers are zeros