0
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans

cc_tfid = TfidfVectorizer().fit_transform(cc_corpus)
cc_km = KMeans(n_clusters = 3, init = 'k-means++', max_iter = 99, n_init = 4, verbose = False )
cc_km.fit(cc_tfid)

plt.scatter(cc_tfid[:, 0], cc_tfid[:, 1])
centroids = cc_km.cluster_centers_
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=200, alpha=0.5)
plt.show()

I can visualize the centroids but not the points because they are from a sparse matrix. How do I plot this please?

SarahJessica
  • 473
  • 1
  • 7
  • 18

1 Answers1

3

You can convert the sparse matrices to dense arrays using .toarray():

plt.scatter(cc_tfid[:, 0].toarray(), cc_tfid[:, 1].toarray())

Note that projecting all points on the first two dimensions of the TF-IDF vector space is likely to result in quite the useless plot. You would be better off piping the data through PCA or t-SNE to reduce the dimensionality to 2.

Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186