-1

I'm trying to use TfidfVectorizer on array with one example and use it for model prediction, but after TfidfVectorizer i get:

<1x24 sparse matrix of type '<class 'numpy.float64'>'
    with 24 stored elements in Compressed Sparse Row format>

insted of: 2x113905 like my x_test or x_train, thats what i did:

labels=df.Label #clasify labels
x_train,x_test,y_train,y_test=train_test_split(df['Text'], labels, test_size=0.2, random_state=7) #split a data
print(len(x_train),"\t\t",len(x_test),"\t\t",len(y_train),"\t\t",len(y_test))
my_stopwords_list = stopwords.words('ukrainian')
test = ['Жінка пропагувала "руській мір" на весь вагон: скандал в електричці на Київщині. Інцидент стався в у приміській електричці сполученням "Святошин" - "Тетерів" у понеділок ввечері. Небайдужі пасажири рішуче відреагували й "висадили" жінку на найближчій станції.']
test = pd.Series(test,name="Text")
#DataFlair - Initialize a TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer(stop_words=my_stopwords_list,smooth_idf=False)
#DataFlair - Fit and transform train set, transform test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train.values.astype('U')) 
tfidf_test=tfidf_vectorizer.transform(x_test.values.astype('U'))
tfidf_train1=tfidf_vectorizer.fit_transform(test.values.astype('U'))

but when i look on tfidf_test and tfidf_train1, i get:

(https://i.stack.imgur.com/Xrzbf.png)

and than can't use model.predict():

pred = model_PassiveAggressiveClassifier.predict(tfidf_train1)

***ValueError**: X has 24 features, but PassiveAggressiveClassifier is expecting 113905 features as input.*

i tried same in this kaggle work, but it didn't work, i have only one clue: i use ukrainian text, but i don't think that it has big impact

Bocley
  • 1
  • 2

1 Answers1

0

I found a solution in this stack problem was that I made new vocabulary and my new "test example" have had only few characters like that:

So i reruned this karnels:

#DataFlair - Initialize a TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer(stop_words=my_stopwords_list,smooth_idf=False)
#DataFlair - Fit and transform train set, transform test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train.values.astype('U')) 
tfidf_test=tfidf_vectorizer.transform(x_test.values.astype('U'))
tfidf_train1=tfidf_vectorizer.transform(test)

After that my "test example" was like that:

<1x113905 sparse matrix of type '<class 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>
Bocley
  • 1
  • 2
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 30 '22 at 08:35