3

I have been trying to compute text similarity such that it'd be between 0 and 1, seen as a probability. The two text are encoded in two vectors, that are a bunch of numbers between [-1, 1]. So as two vectors are given, it seems plausible to use cosine similarity to obtain the vector similarity, but the output value of cosine is in between -1 and 1. So, I'm wondering if there's a method that either: 1) gives similarity between [0,1], or 2) transfer the cosine similarity to [0,1] distribution. Any ideas?

P.S. as I was so much working with cosine similarity, I saw that some suggest transferring the cosine distance to probability, or some suggested that every value between [-1, 0] should be mapped to 0, while keeping values between [0,1] as they are. Honestly, none of the methods makes sense to me, and I think they both mis-change the concept of similarity. So I'm wondering if any elegant method is out there to serve this functionality.

inverted_index
  • 2,329
  • 21
  • 40
  • I admitedly don't know anything about cosine-similarity, but couldn't you just divide the result by two, to get a result in the range -0.5 to +0.5, and then add 0.5 for 0 to 1? Or use some other way to compress the range and add an offset? – Some programmer dude Nov 16 '20 at 02:24
  • 1
    @Someprogrammerdude the method you're referring to seems to be the most accessible way to do this ––like transferring the range, as also mentioned in the link I provided. Nonetheless, I do think that transferring this range would hurt the semantic similarity space; like it doesn't make sense to me (to do this...). – inverted_index Nov 16 '20 at 02:32
  • @inverted_index I was looking for the same thing, check this paper [1] Eq 5, see how the authors convert word embedding similarity to probability. [1] https://arxiv.org/pdf/1810.12738.pdf – Fuji Sep 13 '21 at 11:08

0 Answers0