I have a list of ~500 embedding vectors (each embedding vector is length 400, too long to post, but this is an example of the start of one of them:
[-1.5425615, -0.52326035, 0.48309317, -1.3839878, -1.3774203, -0.44861528, 3.026304, -0.23582345, 4.3516054, -2.1284392, -3.0056703, 1.4997623, 0.51767087, -2.3668504, 0.9771546, -2.5286832, -1.1869463, -1.2889853, -4.272979...]
(so there are ~500 of these vector lists in a list called 'list_of_vectors')
There is also a list_of_labels, where each vector list is assigned to a label.
I want to plot them on a t-SNE plot, so I wrote:
tsne = TSNE(n_components=2)
X_tsne = tsne.fit_transform(list_of_vectors)
The output is:
So there are ~500 dots in the below plot, each one has one label (from list_of_labels)
You can see the dots are very roughly clustered, and I want to just add a couple of labels to each rough cluster, so I know which cluster is which, or can I can colour the clusters differently and have a legend with a sample word from that cluster in the legend?
Is there a way for me to annotate/label a couple of the dots in each cluster?
Or any method that would add say 5/10 labels to the below graph, so I can understand the plot better?
It doesn't have to be super exact, I'm just trying to broadly understand the plot better?