I have a bag-of-words consisting of about 60000 features. Each feature represents a diminsion. I want to represent this bag-of-words in a reduced 2D space. How do I do it?
I have see an example here, which looks more like what I want but not really the same. In the example they have 2 transform and I only have one. Therefore as suggested I do not want to use pipeline. Below is my code which taking for ever and does not show any error message:
#myList contents about 800000 words
bag_of_words = vec.fit_transform(myList)
X = bag_of_words.todense() #this is taking for ever
pca = PCA(n_components=2).fit(X)
data2D = pca.transform(X)
plt.scatter(data2D[:,0], data2D[:,1])
plt.show()
I have not found any better option and right now it looks like I am doing something wrong.
What is the best way to visualize a bag-of-words in a scatterplot?
The bag_of_words looks like this:
(0, 548) 3
(0, 4000) 6
(0, 15346) 1
(0, 23299) 1
(0, 22931) 2
(0, 32817) 1
(0, 51733) 1
(0, 38308) 6
(0, 14784) 1
(0, 146873) 1
....