Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions

votes

4 answers

Using sklearn how do I calculate the tf-idf cosine similarity between documents and a query?

My goal is to input 3 queries and find out which query is most similar to a set of 5 documents. So far I have calculated the tf-idf of the documents doing the following: from sklearn.feature_extraction.text import TfidfVectorizer def…

asked Apr 14 '19 at 16:06

OultimoCoder

votes

3 answers

Cosine similarity TSNE in sklearn.manifold

I have a small problem to perform TSNE on my dataset, using cosine similarity. I have calculated the cosine similarity of all of my vectors, so I have a square matrix which contains my cosine similarity : A = [[ 1 0.7 0.5 0.6 ] [ …

scikit-learn cosine-similarity

asked Apr 11 '16 at 09:58

HugoLasticot

votes

1 answer

word2vec, sum or average word embeddings?

I'm using word2vec to represent a small phrase (3 to 4 words) as a unique vector, either by adding each individual word embedding or by calculating the average of word embeddings. From the experiments I've done I always get the same cosine…

cosine-similarity word2vec sentence-similarity

asked May 09 '15 at 16:23

David Batista

3,029
2
23
42

votes

2 answers

Finding the best cosine similarity in a set of vectors

I have n vectors, each with m elements (real number). I want to find the pair where there cosine similarity is maximum among all pairs. The straightforward solution would require O(n2m) time. Is there any better solution? update Cosine similarity /…

algorithm math cosine-similarity

asked Dec 01 '12 at 16:39

hs3180

votes

3 answers

clustering with cosine similarity

I have a large data set that I would like to cluster. My trial run set size is 2,500 objects; when I run it on the 'real deal' I will need to handle at least 20k objects. These objects have a cosine similarity between them. This cosine similarity…

machine-learning cluster-analysis distance cosine-similarity

asked Jun 22 '12 at 05:14

user1473883

votes

1 answer

Why use cosine similarity in Word2Vec when its trained using dot-product similarity

According to several posts I found on stackoverflow (for instance this Why does word2Vec use cosine similarity?), it's common practice to calculate the cosine similarity between two word vectors after we have trained a word2vec (either CBOW or…

nlp word2vec cosine-similarity word-embedding dot-product

asked Jan 28 '19 at 22:10

Fred Zhang

votes

2 answers

cosine similarity built-in function in matlab

I want to calculate cosine similarity between different rows of a matrix in matlab. I wrote the following code in matlab: for i = 1:n_row for j = i:n_row S2(i,j) = dot(S1(i,:), S1(j,:)) / (norm_r(i) * norm_r(j)); S2(j,i) =…

matlab matrix cosine-similarity

asked Jan 04 '18 at 18:36

Mehdi

votes

1 answer

SQL Computation of Cosine Similarity

Suppose you have a table in a database constructed as follows: create table data (v int, base int, w_td float); insert into data values (99,1,4); insert into data values (99,2,3); insert into data values (99,3,4); insert into data values…

sql cosine-similarity

asked Feb 18 '17 at 03:09

tipanverella

3,477
3
25
41

votes

2 answers

Cosine similarity between 0 and 1

I am interested in calculating similarity between vectors, however this similarity has to be a number between 0 and 1. There are many questions concerning tf-idf and cosine similarity, all indicating that the value lies between 0 and 1. From…

python scikit-learn gensim similarity cosine-similarity

asked May 26 '19 at 19:53

Bram Vanroy

27,032
24
137
239

votes

2 answers

Python: Cosine similarity between two large numpy arrays

I have two numpy arrays: Array 1: 500,000 rows x 100 cols Array 2: 160,000 rows x 100 cols I would like to find the largest cosine similarity between each row in Array 1 and Array 2. In other words, I compute the cosine similarities between the…

python numpy scikit-learn cosine-similarity

asked Aug 26 '18 at 23:18

Alex

4,030
8
40
62

votes

2 answers

Python tf-idf: fast way to update the tf-idf matrix

I have a dataset of several thousand rows of text, my target is to calculate the tfidf score and then cosine similarity between documents, this is what I did using gensim in Python followed the tutorial: dictionary = corpora.Dictionary(dat) corpus =…

python nlp tf-idf gensim cosine-similarity

asked Feb 13 '17 at 19:54

snowneji

1,086
1
11
25

votes

3 answers

create cosine similarity matrix numpy

Suppose I have a numpy matrix like the following: array([array([ 0.0072427 , 0.00669255, 0.00785213, 0.00845336, 0.01042869]), array([ 0.00710799, 0.00668831, 0.00772334, 0.00777796, 0.01049965]), array([ 0.00741872, 0.00650899, …

python numpy matrix cosine-similarity

asked Jan 28 '17 at 00:13

Sal

votes

1 answer

Pairwise Operations between Rows of Spark Dataframe (Pyspark)

I have a Spark Dataframe with two columns: id and hash_vector. The id is the id for a document and hash_vector is a SparseVector of word counts corresponding to the document (and has size 30000). There are ~100000 rows (one for each document) in…

apache-spark pyspark apache-spark-sql cosine-similarity

asked Oct 03 '16 at 15:18

SashaGreen

votes

3 answers

Vectorized cosine similarity calculation in Python

I have two large sets of vectors, A and B. Each element of A is a 1-dimensional vector of length 400, with float values between -10 and 10. For each vector in A, I'm trying to calculate the cosine similarities to all vectors in B in order to find…

python matrix cosine-similarity

asked Dec 03 '15 at 15:48

BoltzmannBrain

5,082
11
46
79

votes

2 answers

Right way to compute cosine similarity between two arrays?

I am working on a project that detects some features of two input images(handwritten signatures) and compares those two features using cosine similarity. Here When I mean two input images, one is an original image, and other is duplicate image. Say…

c++ arrays opencv mat cosine-similarity

asked May 22 '15 at 19:02

Shruthi Kodi

Prev 1

…

66 67 Next