I'm trying to use TfidfVectorizer on array with one example and use it for model prediction, but after TfidfVectorizer i get:
<1x24 sparse matrix of type '<class 'numpy.float64'>'
with 24 stored elements in Compressed Sparse Row format>
insted of: 2x113905 like my x_test or x_train, thats what i did:
labels=df.Label #clasify labels
x_train,x_test,y_train,y_test=train_test_split(df['Text'], labels, test_size=0.2, random_state=7) #split a data
print(len(x_train),"\t\t",len(x_test),"\t\t",len(y_train),"\t\t",len(y_test))
my_stopwords_list = stopwords.words('ukrainian')
test = ['Жінка пропагувала "руській мір" на весь вагон: скандал в електричці на Київщині. Інцидент стався в у приміській електричці сполученням "Святошин" - "Тетерів" у понеділок ввечері. Небайдужі пасажири рішуче відреагували й "висадили" жінку на найближчій станції.']
test = pd.Series(test,name="Text")
#DataFlair - Initialize a TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer(stop_words=my_stopwords_list,smooth_idf=False)
#DataFlair - Fit and transform train set, transform test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train.values.astype('U'))
tfidf_test=tfidf_vectorizer.transform(x_test.values.astype('U'))
tfidf_train1=tfidf_vectorizer.fit_transform(test.values.astype('U'))
but when i look on tfidf_test and tfidf_train1, i get:
(https://i.stack.imgur.com/Xrzbf.png)
and than can't use model.predict():
pred = model_PassiveAggressiveClassifier.predict(tfidf_train1)
***ValueError**: X has 24 features, but PassiveAggressiveClassifier is expecting 113905 features as input.*
i tried same in this kaggle work, but it didn't work, i have only one clue: i use ukrainian text, but i don't think that it has big impact