what is the best way to visualize the bag-of-words in a scatterplot

Asked Dec 06 '16 at 19:42

Active Dec 07 '16 at 10:18

Viewed 1,304 times

I have a bag-of-words consisting of about 60000 features. Each feature represents a diminsion. I want to represent this bag-of-words in a reduced 2D space. How do I do it?

I have see an example here, which looks more like what I want but not really the same. In the example they have 2 transform and I only have one. Therefore as suggested I do not want to use pipeline. Below is my code which taking for ever and does not show any error message:

#myList contents about 800000 words
bag_of_words = vec.fit_transform(myList)
X = bag_of_words.todense() #this is taking for ever
pca = PCA(n_components=2).fit(X)
data2D = pca.transform(X)
plt.scatter(data2D[:,0], data2D[:,1])
plt.show()

I have not found any better option and right now it looks like I am doing something wrong.

What is the best way to visualize a bag-of-words in a scatterplot?

The bag_of_words looks like this:

(0, 548)    3
(0, 4000)   6
(0, 15346)  1
(0, 23299)  1
(0, 22931)  2
(0, 32817)  1
(0, 51733)  1
(0, 38308)  6
(0, 14784)  1
(0, 146873) 1
 ....

edited May 23 '17 at 11:52

Community

asked Dec 06 '16 at 19:42

eskoba

Please I have no problem with getting down voted but I would like to know why – eskoba Dec 07 '16 at 09:35
Which line of your code is taking forever? – Stop harming Monica Dec 07 '16 at 09:56
the line with X = bag_of_words.todense() – eskoba Dec 07 '16 at 10:17
I don't know why that could happen. `.todense()` is always fast with my sparse matrices. Can't help more without having the data. – Stop harming Monica Dec 07 '16 at 13:00

what is the best way to visualize the bag-of-words in a scatterplot

0 Answers0