0

I have a dictionary called "topic_word"

topic_word = {0: [[-0.669712, 0.6868, 0.9821409999999999], [-0.925967, 0.6138399999999999, 1.247525], [-1.09941, 1.0252620000000001, 1.327866]], 
1: [[-0.862131, 0.890915, 1.07759], [-0.437658, 0.279271, 0.627497], [-0.437658, 0.279271, 0.627497]], 
2: [[-0.671647, 0.670583, 0.937155], [-0.675347, 0.466983, 0.8505440000000001], [-0.706244, 0.612532, 0.762877]], 
3: [[-0.8414590000000001, 0.797826, 1.124295], [-0.567535, 0.40820300000000004, 0.811368], [-0.800963, 0.699767, 0.9237989999999999]], 
4: [[-0.8560549999999999, 1.0617020000000001, 1.579302], [-0.576105, 0.5029239999999999, 0.9392], [-0.743683, 0.69884, 0.9794930000000001]]
}

where each key represents topic ( here 0 to 4; 5 topics) and value represents embeddings of words under each topic ( here every topic has 3 words).
I want to visualize data using 2-d scatter plot
if need to normalize how can I normalize "topic_word" data that I can represent correctly in python 3.x

How to visualize it using Scatter plot that will show cluster of words (dots) under their topics.
something as below:

import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()

for key, value in topic_word.items():
   ax.scatter(value[0],value[1],label=key)
plt.legend()
Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72
iforcebd
  • 87
  • 11
  • What is the question here? The normalization procedure or the generation of the scatter plot? Anyhow, both are tasks, not questions. Please describe your code attempt, the expected output, and how the actual output differed. – Mr. T Oct 14 '20 at 11:08
  • Thank you for your replay, actually I am very new. Both aspects you mentioned is my question/task to finish , because If I can't normalize value to x,y coordinate I can't generate plot. if I directly apply "plt.scatter(value[0],value[1],label=key)" to "topic_word" for each topic it will only take value at position 0 and 1 and will left 3rd list , 0: [[-0.669712, 0.6868, 0.9821409999999999, ], [-0.925967, 0.6138399999999999, 1.247525,], [-1.09941, 1.0252620000000001, 1.327866, ]] in thsi case it will ignore the last element "[-1.09941, 1.0252620000000001, 1.327866, ]". – iforcebd Oct 14 '20 at 11:31
  • As we can use standardScaler.fit_transform(value) when using dataframe, how can I normalize so that I can use "plt.scatter(value[0],value[1],label=key)" to visualize "topic_word" correctly. If it doesn't need to normalize then what is the procedure to visualize it directly. Plese feel free to ask to clarify if I made you confused. Thank you very much for your valuable time. – iforcebd Oct 14 '20 at 11:31
  • Does this answer your question? [Visualise word2vec generated from gensim](https://stackoverflow.com/questions/43776572/visualise-word2vec-generated-from-gensim) – Sergey Bushmanov Oct 14 '20 at 15:51
  • Similar but, in addition with this documentation I need topic wise clustering – iforcebd Oct 14 '20 at 18:33

1 Answers1

1

I gather from your post that you want to have normalized values for each list corresponding to a key. And, each one of these normalized lists are represented as scatter datapoints. Here's one way to do it:

import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
topic_word = {0: [[-0.669712, 0.6868, 0.9821409999999999], [-0.925967, 0.6138399999999999, 1.247525], [-1.09941, 1.0252620000000001, 1.327866]], 
1: [[-0.862131, 0.890915, 1.07759], [-0.437658, 0.279271, 0.627497], [-0.437658, 0.279271, 0.627497]], 
2: [[-0.671647, 0.670583, 0.937155], [-0.675347, 0.466983, 0.8505440000000001], [-0.706244, 0.612532, 0.762877]], 
3: [[-0.8414590000000001, 0.797826, 1.124295], [-0.567535, 0.40820300000000004, 0.811368], [-0.800963, 0.699767, 0.9237989999999999]], 
4: [[-0.8560549999999999, 1.0617020000000001, 1.579302], [-0.576105, 0.5029239999999999, 0.9392], [-0.743683, 0.69884, 0.9794930000000001]]
}
colorkey={0:'red',1:'blue',2:'green',3:'black',4:'magenta'} # creating a color map for keys
for key, value in topic_word.items():
    valno=0 # keeping a count of number of lists under each topic_word (key)
    for val in value:
        meanval=np.mean(val) 
        stdval=np.std(val)
        val = (val-meanval)/(stdval) # normalized list
        ax.scatter(key*np.ones(len(val)),val,color=colorkey[key],label="Topic "+str(key) if valno == 0 else "") # label is done such that duplication of legend elements is avoided
        handles, labels = ax.get_legend_handles_labels()
        valno=valno+1
fig.legend(handles, labels, loc='best')  

  

enter image description here

Sameeresque
  • 2,464
  • 1
  • 9
  • 22
  • Thank you very much for you kind effort and valuable time. I have one question to be clear about the plot, that is - under every key (topic 0 to 4) we have three words ( three lists of value for each key 0 to 4) so, why its representing 6 dots of each category instead of three dots? – iforcebd Oct 14 '20 at 18:41
  • can we represent it something like this but representing cluster (topic wise words) https://stackoverflow.com/questions/43776572/visualise-word2vec-generated-from-gensim – iforcebd Oct 14 '20 at 18:44