0

I have clustered different texts into 15 clusters.

The texts are in the form:

"Oreo Biscuit is good"
"Healthy Breakfast
"Cars are fast"
....

I converted the texts into Word2Vec of 100 dimensions. Now I have 15 clusters, which I want to plot.

Instead of plotting all the points, I want to plot 1 point for each cluster, so that there would be 15 points in the plot. How do I do that?

Ideas:

1) Use the cluster centre to plot each cluster.

Is there any other way  (Converting all the Word2vecs in a cluster into 
Doc2vec) or 

Can Mds (Multi Dimensional Scaling) be used to plot the . 
clusters?

Thank you

Jerry George
  • 335
  • 1
  • 7
  • 23
  • Lets try to understand your problem. Lets assume that you have 1500 text. You have already found 15 clusters for these text. Could you please tell how did you find these clusters? – Abhishek Mishra Jun 11 '18 at 04:25
  • Converted all text to Word 2vec, Used Kmeans Clustering with K=15 – Jerry George Jun 11 '18 at 04:35
  • all text may have different number of words so word2vec representation of each text can have different lenght. How did you overcome this problem? – Abhishek Mishra Jun 11 '18 at 04:40
  • I used Spark Word2vec, where I set the dimensions of the vector to be 100 for each text [collection of token] – Jerry George Jun 11 '18 at 05:02

1 Answers1

1

You already have vector representation of each text. You also have clusters for these texts. You have following options that are very trivial:

  1. You just plotthe clusters centriods using some dimensionality reduction mechanism. (Pro: Simple, Cons: Doesn't have information about the goodness of each individual cluster)
  2. You still plot the cluster centroids but these time variance can be added as the third dimension using some bubble plots as shown here. (Pro: Include both mean and variance, Cons: K-mean is too simple)
  3. We can apply some spectral clustering approaches and then apply the above methods on top of that.
Abhishek Mishra
  • 1,984
  • 1
  • 18
  • 13