1

I have a bag-of-words consisting of about 60000 features. Each feature represents a diminsion. I want to represent this bag-of-words in a reduced 2D space. How do I do it?

I have see an example here, which looks more like what I want but not really the same. In the example they have 2 transform and I only have one. Therefore as suggested I do not want to use pipeline. Below is my code which taking for ever and does not show any error message:

#myList contents about 800000 words
bag_of_words = vec.fit_transform(myList)
X = bag_of_words.todense() #this is taking for ever
pca = PCA(n_components=2).fit(X)
data2D = pca.transform(X)
plt.scatter(data2D[:,0], data2D[:,1])
plt.show() 

I have not found any better option and right now it looks like I am doing something wrong.

What is the best way to visualize a bag-of-words in a scatterplot?

The bag_of_words looks like this:

(0, 548)    3
(0, 4000)   6
(0, 15346)  1
(0, 23299)  1
(0, 22931)  2
(0, 32817)  1
(0, 51733)  1
(0, 38308)  6
(0, 14784)  1
(0, 146873) 1
 ....
Community
  • 1
  • 1
eskoba
  • 532
  • 1
  • 7
  • 25

0 Answers0