Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions
4
votes
1 answer

Calculating similarity based on attributes

My objective is to calculate the degree of similarity between two users based on their attributes. For instance let's consider a player and consider age, salary, and points as attributes. Also I want to place weight on each attribute by order of…
user1010101
  • 2,062
  • 7
  • 47
  • 76
4
votes
3 answers

python glove similarity measure calculation

i am trying to understand how python-glove computes most-similar terms. Is it using cosine similarity? Example from python-glove github https://github.com/maciejkula/glove-python/tree/master/glove : I know that from gensim's word2vec, the…
jxn
  • 7,685
  • 28
  • 90
  • 172
4
votes
2 answers

Cosine similarity using TFIDF

There are several questions on SO and the web describing how to take the cosine similarity between two strings, and even between two strings with TFIDF as weights. But the output of a function like scikit's linear_kernel confuses me a…
David
  • 1,454
  • 3
  • 16
  • 27
4
votes
2 answers

How to run a large matrix for cosine similarity in Python?

I want to calculate cosine similarity between articles. And I am running into the problem that my implementation approach would take a long time for the size of the data that I am going to run. from scipy import spatial import numpy as np from…
YAL
  • 651
  • 2
  • 7
  • 22
4
votes
1 answer

cosine distance between two matrices

Take two matrices, arr1, arr2 of size mxn and pxn respectively. I'm trying to find the cosine distance of their respected rows as a mxp matrix. Essentially I want to take the the pairwise dot product of the rows, then divide by the outer product of…
Kevin Johnson
  • 820
  • 11
  • 24
4
votes
2 answers

To find cosine similarity between two string(names)

I am using python and scikit-learn to find the cosine similarity between two strings(specifically, names).The program is able to find the similarity score between two strings but, when strings are abbreviated, it shows some undesirable output. e.g-…
4
votes
1 answer

Spark Cosine Similarity (DIMSUM algorithm ) sparse input file

I was wondering whether it would be possible for Spark Cosine Similarity to work with Sparse input data? I have seen examples wherein the input consists of lines of space-separated features of the form: id feat1 feat2 feat3 ... but I have an…
anonuser0428
  • 11,789
  • 22
  • 63
  • 86
4
votes
0 answers

Sparse vector dot product with mongo aggregate

I am having documents with an attached sparse vector like this: { "_id" : ObjectId "vec" : [ { "dim" : 1, "weight" : 8 }, { "dim" : 3, "weight" : 3 } ] } I am trying to get the normalised dot…
4
votes
1 answer

some questions on cosine similarity

Yesterday I learnt that the cosine similarity, defined as can effectively measure how similar two vectors are. I find that the definition here uses the L2-norm to normalize the dot product of A and B, what I am interested in is that why not use the…
4
votes
1 answer

tm.package: findAssocs vs Cosine

I'm new here and my questions is of mathematical rather than programming nature where I would like to get a second opinion on whether my approach makes sense. I was trying to find associations between words in my corpus using the function…
IVR
  • 1,718
  • 2
  • 23
  • 41
4
votes
1 answer

Cosine Similarity PHP

I want to calculate the cosine similarity between 1 (ID1) and 3 (ID1) in PHP, similarly for 1 and 4, 3 and 4. formula would be something like this: similarity = (1.1 * 3.1 + 1.4 * 3.4)/(((1.1)^2+(1.3)^2+(1.4)^2)^0.5)(((3.1)^2+ (3.4)^2)^0.5) =…
user2044770
  • 79
  • 2
  • 12
4
votes
1 answer

how can I implement the tf-idf and cosine similarity in Lucene?

How can I implement the tf-idf and cosine similarity in Lucene? I'm using Lucene 4.2. The program that I've created does not use tf-idf and Cosine similaryty, it only uses TopScoreDocCollector. import com.mysql.jdbc.Statement; import…
Tia Chandrawati
  • 63
  • 1
  • 2
  • 6
3
votes
1 answer

Interpretation of cosine similarity and jaccard similarity (similarity of histograms)

Introduction I would like to assess the similarity between two "bin counts" arrays (related to two histograms), by using the Matlab "pdist2" function: % Input bin_counts_a = [689 430 311 135 66 67 99 23 37 19 8 4 …
limone
  • 279
  • 2
  • 9
3
votes
1 answer

How can I find the cosine similarity between two song lyrics represented as strings?

My friends and I are doing an NLP project on song recommendation. Context: We originally planned on giving the model a recommended song playlist that has the most similar lyrics based on the random input corpus(from the literature etc), however we…
3
votes
0 answers

How do I group words that are in a list based on cosine similarity?

I have a list of strings: my_list = ['policeman', 'police officers', 'police force',..].The length of list is around 2000. I want to group those words based on cosine similarity. If the cosine similairty is above 0.7, i want to group them together.…
Fio
  • 31
  • 1