Cosine similarity is broadly used for measuring the similarity between two vectors, where two could be word vectors or document vectors.
Others, like manhattan, euclidean, minkowski, etc, are also popular.
Cosine similarity gives the number between 0 and 1 so it SEEMS like it is a percentage of the similarity between two vectors. Euclidean gives some number in large variation.
.
When cosine similarity between two vectors gives 0.78xxx, people including me probably expect that "these two vectors are 78 % similar!", which it is not actual "similarity degree" of two vectors.
.
Unlike cosine similarity, minkowski, manhattan, canberra, etc even give some large number not ranged in 0 to 1.
For word1:word2 example
0.78 (cosine, gives between 0 to 1)
9.54 (Euclidean, gives the actual distance between two vectors)
158.417 (Canberra)
.
I expect that there might be some normalization methods broadly used to represent the actual "similarity degree" between two vectors. Please provide if you know some. If there are articles or papers, it would be much better.
For word1:word2 example
0.848 (cosine, transformed as normalized number)
0.758 (Euclidean, normalized between 0 to 1)
0.798 (Canberra, normalized between 0 to 1)
I do not expect you to mention about the softmax number because I read an article that the softmax number itself should not be considered as the actual percentage.