Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions
5
votes
1 answer

Cosine Similarity between columns of two dataframes of differing lengths?

I have text column in df1 and text column in df2. The length of df2 will be different to that of length of df1. I want to calculare cosine similarity for every entry in df1[text] against every entry in df2[text] and give a score for every…
5
votes
1 answer

Best way to identify dissimilarity: Euclidean Distance, Cosine Distance, or Simple Subtraction?

I'm new to data science and am currently learning different techniques that I can do with Python. Currently, I'm trying it out with Spotify's API for my own playlists. The goal is to find the most dissimilar features between two different playlist.…
5
votes
2 answers

Algorithms for matching based on keywords intersection

Suppose we have buyers and sellers that are trying to find each other in a market. Buyers can tag their needs with keywords; sellers can do the same for what they are selling. I'm interested in finding algorithm(s) that rank-order sellers in terms…
John Horton
  • 4,122
  • 6
  • 31
  • 45
5
votes
1 answer

How to speed up cosine similarity between a numpy array and a very very large matrix?

I have a problem where a need to calculate cosine similarities between a numpy array of shape (1, 300) and a matrix of shape (5000000, 300). I have tried multiple different flavors of codes and now I am wondering if there is a way to reduce the run…
ajaanbaahu
  • 344
  • 3
  • 20
5
votes
3 answers

How to compute similarities based on co-occurrence matrix?

I have an item-item matrix (1877 x 1877). The values in the matrix represent the number of times two items occurred together. How can I determine the similarities between two items? Through reading, i found few options. However i am not sure about…
kitchenprinzessin
  • 1,023
  • 3
  • 14
  • 30
5
votes
1 answer

Spark ml cosine similarity: how to get 1 to n similarity score

I read that I could use the columnSimilarities method that comes with RowMatrix to find the cosine similarity of various records (content-based). My data looks something like this: genre,actor horror,mohanlal shobhana pranav comedy,mammooty suraj…
5
votes
5 answers

Cosine distance of vector to matrix

In python, is there a vectorized efficient way to calculate the cosine distance of a sparse array u to a sparse matrix v, resulting in an array of elements [1, 2, ..., n] corresponding to cosine(u,v[0]), cosine(u,v[1]), ..., cosine(u, v[n])?
David
  • 1,454
  • 3
  • 16
  • 27
5
votes
1 answer

Why are Cosine Similarity and TF-IDF used together?

TF-IDF and Cosine Similarity is a commonly used combination for text clustering. Each document is represented by vectors of TF-IDF weights. This is what my text book says. With Cosine Similarity you can then compute the similarities between…
Evgenij Reznik
  • 17,916
  • 39
  • 104
  • 181
5
votes
2 answers

PostgreSQL: Find sentences closest to a given sentence

I have a table of images with sentence captions. Given a new sentence I want to find the images that best match it based on how close the new sentence is to the stored old sentences. I know that I can use the @@ operator with a to_tsquery but…
5
votes
1 answer

Understanding elasticsearch query score explain

I'm trying to decipher the explain API in the elasticsearch response. But a bit lost. It's a bit hard to follow for me. Any simple pointers or links that will explain the JSON more specifically? I have an understanding of TF, IDF and the cosine…
user1189332
  • 1,773
  • 4
  • 26
  • 46
5
votes
3 answers

How to replace string values in pandas dataframe to integers?

I have a Pandas DataFrame that contains several string values. I want to replace them with integer values in order to calculate similarities. For example: stores[['CNPJ_Store_Code','region','total_facings']].head() Out[24]: CNPJ_Store_Code …
user3318421
  • 91
  • 3
  • 8
5
votes
2 answers

How do I calculate the shortest path (geodesic) distance between two adjectives in WordNet using Python NLTK?

Computing the semantic similarity between two synsets in WordNet can be easily done with several built-in similarity measures, such as: synset1.path_similarity(synset2) synset1.lch_similarity(synset2), Leacock-Chodorow…
modarwish
  • 495
  • 10
  • 22
5
votes
5 answers

Using k-means for document clustering, should clustering be on cosine similarity or on term vectors?

Apologies if the answer to this is obvious, please be kind, this is my first time on here :-) I would gratefully appreciate if someone could give me a steer on the appropriate input data structure for k-means. I am working on a masters dissertation…
5
votes
1 answer

What's the difference between Pearson correlation similarity and adjust cosine similarity?

While they are very similar, I am sure there is some difference between Pearson correlation similarity and adjust cosine similarity, because all the papers and web pages divide them into two different kinds. However none of them provide a clear…
5
votes
2 answers

Calculate cosine similarity of two matrices

I have defined two matrices like following: from scipy import linalg, mat, dot a = mat([-0.711,0.730]) b = mat([-1.099,0.124]) Now, I want to calculate the cosine similarity of these two matrices. What is the wrong with following code. It gives me…
Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121