Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions

votes

1 answer

Cosine Similarity between columns of two dataframes of differing lengths?

I have text column in df1 and text column in df2. The length of df2 will be different to that of length of df1. I want to calculare cosine similarity for every entry in df1[text] against every entry in df2[text] and give a score for every…

asked Dec 31 '19 at 10:15

Python Learner

votes

1 answer

Best way to identify dissimilarity: Euclidean Distance, Cosine Distance, or Simple Subtraction?

I'm new to data science and am currently learning different techniques that I can do with Python. Currently, I'm trying it out with Spotify's API for my own playlists. The goal is to find the most dissimilar features between two different playlist.…

pandas data-science similarity euclidean-distance cosine-similarity

asked Nov 06 '18 at 14:13

Mustafa

votes

2 answers

Algorithms for matching based on keywords intersection

Suppose we have buyers and sellers that are trying to find each other in a market. Buyers can tag their needs with keywords; sellers can do the same for what they are selling. I'm interested in finding algorithm(s) that rank-order sellers in terms…

algorithm e-commerce keyword matching cosine-similarity

asked Feb 28 '11 at 13:21

John Horton

4,122
6
31
45

votes

1 answer

How to speed up cosine similarity between a numpy array and a very very large matrix?

I have a problem where a need to calculate cosine similarities between a numpy array of shape (1, 300) and a matrix of shape (5000000, 300). I have tried multiple different flavors of codes and now I am wondering if there is a way to reduce the run…

python cuda gpu numba cosine-similarity

asked Dec 05 '17 at 21:04

ajaanbaahu

votes

3 answers

How to compute similarities based on co-occurrence matrix?

I have an item-item matrix (1877 x 1877). The values in the matrix represent the number of times two items occurred together. How can I determine the similarities between two items? Through reading, i found few options. However i am not sure about…

python matrix cosine-similarity find-occurrences

asked Feb 01 '17 at 07:37

kitchenprinzessin

1,023
3
14
30

votes

1 answer

Spark ml cosine similarity: how to get 1 to n similarity score

I read that I could use the columnSimilarities method that comes with RowMatrix to find the cosine similarity of various records (content-based). My data looks something like this: genre,actor horror,mohanlal shobhana pranav comedy,mammooty suraj…

scala apache-spark apache-spark-mllib cosine-similarity apache-spark-ml

asked Oct 18 '16 at 08:38

void

2,403
6
28
53

votes

5 answers

Cosine distance of vector to matrix

In python, is there a vectorized efficient way to calculate the cosine distance of a sparse array u to a sparse matrix v, resulting in an array of elements [1, 2, ..., n] corresponding to cosine(u,v[0]), cosine(u,v[1]), ..., cosine(u, v[n])?

python vectorization cosine-similarity

asked Apr 28 '16 at 16:19

David

1,454
3
16
27

votes

1 answer

Why are Cosine Similarity and TF-IDF used together?

TF-IDF and Cosine Similarity is a commonly used combination for text clustering. Each document is represented by vectors of TF-IDF weights. This is what my text book says. With Cosine Similarity you can then compute the similarities between…

data-mining text-mining tf-idf cosine-similarity linguistics

asked Feb 09 '16 at 20:27

Evgenij Reznik

17,916
39
104
181

votes

2 answers

PostgreSQL: Find sentences closest to a given sentence

I have a table of images with sentence captions. Given a new sentence I want to find the images that best match it based on how close the new sentence is to the stored old sentences. I know that I can use the @@ operator with a to_tsquery but…

postgresql full-text-search tf-idf cosine-similarity

asked Jan 05 '16 at 03:29

Real Geek N

votes

1 answer

Understanding elasticsearch query score explain

I'm trying to decipher the explain API in the elasticsearch response. But a bit lost. It's a bit hard to follow for me. Any simple pointers or links that will explain the JSON more specifically? I have an understanding of TF, IDF and the cosine…

elasticsearch lucene tf-idf cosine-similarity

asked Oct 23 '15 at 15:24

user1189332

1,773
4
26
46

votes

3 answers

How to replace string values in pandas dataframe to integers?

I have a Pandas DataFrame that contains several string values. I want to replace them with integer values in order to calculate similarities. For example: stores[['CNPJ_Store_Code','region','total_facings']].head() Out[24]: CNPJ_Store_Code …

python pandas dataframe cosine-similarity

asked Aug 06 '15 at 07:01

user3318421

votes

2 answers

How do I calculate the shortest path (geodesic) distance between two adjectives in WordNet using Python NLTK?

Computing the semantic similarity between two synsets in WordNet can be easily done with several built-in similarity measures, such as: synset1.path_similarity(synset2) synset1.lch_similarity(synset2), Leacock-Chodorow…

python nlp nltk wordnet cosine-similarity

asked Jul 05 '15 at 19:26

modarwish

votes

5 answers

Using k-means for document clustering, should clustering be on cosine similarity or on term vectors?

Apologies if the answer to this is obvious, please be kind, this is my first time on here :-) I would gratefully appreciate if someone could give me a steer on the appropriate input data structure for k-means. I am working on a masters dissertation…

php cluster-analysis k-means tf-idf cosine-similarity

asked May 11 '15 at 12:51

Claire McMahon

votes

1 answer

What's the difference between Pearson correlation similarity and adjust cosine similarity?

While they are very similar, I am sure there is some difference between Pearson correlation similarity and adjust cosine similarity, because all the papers and web pages divide them into two different kinds. However none of them provide a clear…

similarity cosine-similarity pearson

asked Nov 13 '14 at 02:31

Xiaoning Liu - MSFT

votes

2 answers

Calculate cosine similarity of two matrices

I have defined two matrices like following: from scipy import linalg, mat, dot a = mat([-0.711,0.730]) b = mat([-1.099,0.124]) Now, I want to calculate the cosine similarity of these two matrices. What is the wrong with following code. It gives me…

python numpy matrix cosine-similarity

asked Feb 24 '14 at 06:42

Nilani Algiriyage

32,876
32
87
121

Prev 1 2 3

…

66 67 Next