Use word2vec word embeding as feature vector for text classification (simlar to count vectorizer/tfidf feature vector)

Question

I am trying to perform some text classification using machine learning and for that I have extracted feature vectors from the per-processed textual data using simple bag of words approach(count vectorizer) and tfidf vectorizer.

Now I want to use word2vec i.e. word embedding as my feature vector similar as that of count vectorizer/tfidf vectorizer where I should be able to learn vocabulary from the train data and transform or fit the test data with the learned vocab but I can't find a way to implement that.

//I need something like this with word2vec

count = CountVectorizer()
train_feature_ vector =count.fit_transform(train_data)
test_feature_vector = count.fit(test_data)

//So I can train my model like this
mb = MultinomialNB()
mb.fit(train_feature_vector,y_train)
acc_score = mb.score(test_feature_vector,y_test)
print("Accuracy "+str(acc_score))

What is your question exactly? Please provide with more details — Yoshitha Penaganti, Mar 11 '19 at 07:12

score 1 · Accepted Answer · edited Mar 11 '19 at 15:46

1

You first should understand what Word Embeddings are. When you apply a CountVectorizer or TfIdfVectorizer what you get is a sentence representation in a sparse way, commonly known as a One Hot encoding. The word embeddings representation are used to represent a word in a high dimensional space of real numbers.

Once you get your per word representation there are some ways to do this, check:How to get vector for a sentence from the word2vec of tokens in sentence

edited Mar 11 '19 at 15:46

gojomo

52,260
14
86
115

answered Mar 11 '19 at 07:46

OSainz

522
3
6

Thank you for the link. Its exactly what I was talking about. – Dsujan Mar 11 '19 at 07:52

Use word2vec word embeding as feature vector for text classification (simlar to count vectorizer/tfidf feature vector)

1 Answers1