Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions

votes

1 answer

How to represent image or audio through vectors for cosine similarity?

I know that cosine similarity can be used to measure how two images or audios are similar. But I don't understand how an image can be represented as a N-dimensions vector. For a text document d, each i-th dimension represents the term t_i, and it's…

asked May 06 '16 at 10:19

justHelloWorld

6,478
8
58
138

votes

0 answers

computing cosine-similarity between all texts in a corpus

I have a set of documents stored in a JOSN file. Along this line, I retrieve them using the following code so that they are stored under the term data: import json with open('SDM_2015.json') as f: data = [json.loads(line) for line in…

python tf-idf corpus cosine-similarity

asked Apr 27 '16 at 10:54

Economist_Ayahuasca

1,648
24
33

votes

2 answers

What more advantageous minhash over simhash?

I am working with simhash but also see minhash is more effective. But I don't understand. Please explain for me: What more advantageous minhash over simhash ?

similarity cosine-similarity minhash simhash

asked Apr 15 '16 at 12:35

xfr1end

votes

1 answer

How to use similarities.Similarity in gensim?

How to use similarities.Similarity in gensim Because if I use similarities.MatrixSimilarity: index = similarities.MatrixSimilarity(tfidf[corpus]) It just told me: C:\Users\Administrator\AppData\Local\Enthought\Canopy\User\lib\site-…

python gensim cosine-similarity

asked Apr 12 '16 at 15:54

K. Sueca

votes

2 answers

Compute distance between maps that represent sparse vectors c++

Introduction and source code I am trying to compute the cosine similarity between two sparse vectors of dimension 169647.As input, the two vectors are represented as a string of the form . Only the non zero elements of the vector are…

c++ dictionary vector distance cosine-similarity

asked Jan 20 '16 at 11:10

Hani Goc

2,371
5
45
89

votes

4 answers

How can I calculate Cosine similarity between two strings vectors

I have 2 vectors of dimensions 6 and I would like to have a number between 0 and 1. a=c("HDa","2Pb","2","BxU","BuQ","Bve") b=c("HCK","2Pb","2","09","F","G") Can anyone explain what I should do?

r machine-learning cosine-similarity

asked Dec 02 '15 at 14:52

Ozgur Alptekın

votes

0 answers

Calculate similarity score for cells with different dimensions in R

If my columns have different dimensions for each cell but I want to have similarity scores for each pair, how can I accomplish this? Right now, I'm thinking: Step 1: Find all the unique values in a specific column. For example, a column with 100…

r loops lapply dimensions cosine-similarity

asked Nov 28 '15 at 08:09

Wenkai Ying

votes

2 answers

Extrapolate Sentence Similarity Given Word Similarities

Assuming that I have a word similarity score for each pair of words in two sentences, what is a decent approach to determining the overall sentence similarity from those scores? The word scores are calculated using cosine similarity from vectors…

wordnet cosine-similarity word2vec sentence-similarity

asked Jan 27 '15 at 04:31

Scott Klarenbach

37,171
15
62
91

votes

0 answers

Amplifying a locality sensitive hash

I'm trying to build a cosine locality sensitive hash so I can find candidate similar pairs of items without having to compare every possible pair. I have it basically working, but most of the pairs in my data seem to have cosine similarity in the…

machine-learning data-mining cosine-similarity locality-sensitive-hash

asked Jan 21 '15 at 10:32

Philip Pearl

1,523
16
26

votes

1 answer

Using Latent Semantic Analysis to measure passage similarity

Im currently developing a program to compare two pieces of text based on its semantics (meaning). I understand there are libraries such as lingpipe which provide useful methods to compare string distances, however i've heard that LSA is the best…

nlp similarity cosine-similarity lingpipe latent-semantic-analysis

asked Oct 13 '14 at 12:05

kype

votes

0 answers

Need a similarity measure for these vectors

I have a Python function that takes in a block of text and returns a special 2D vector/dictionary representation of it, depending on a chosen length n. An example output might look like this: 1: [6, 8, 1] 2: [6, 16, 4, 4, 5, 11, 5, 8] 3: [4, 7, 8,…

python algorithm vector comparison cosine-similarity

asked Sep 04 '14 at 21:54

norman

5,128
13
44
75

votes

1 answer

error in computing text similarity using scikit learn

I'm a beginner in vector space model (VSM). And i tried the code from this site. It's a very good intoduction to VSM but i somehow managed to get different results from the author. It might be because of some compatibility problem as scikit learn…

python machine-learning nltk cosine-similarity

asked Sep 08 '13 at 19:40

DJJ

2,481
2
28
53

votes

1 answer

calculate Similarity of two adverbs or two adjectives

I want to write a program to calculate the similarity of two adverbs or two adjectives, but The WordNet has not ontology structure for adverb and adjective. At the first try, I used The Adapt-lesk algorithm. The result of this algorithm is very…

nlp similarity wordnet cosine-similarity wsd

asked Mar 29 '13 at 12:40

SahelSoft

votes

2 answers

Mathematical method for multiple document clustering by Cosine Similarity

Cosine Similarity: is often used when comparing two documents against each other. It measures the angle between the two vectors. If the value is zero the angle between the two vectors is 90 degrees and they share no terms. If the value is 1 the two…

machine-learning cluster-computing information-retrieval document-classification cosine-similarity

asked Dec 19 '12 at 15:26

Dheya Majid

votes

1 answer

Does a larger tf always boost a documents score in Lucene?

I understand that the default term frequency (tf) is simply calculated as the sqrt of number of times a particular term being searched appears in a field. So documents containing multiple occurences of a term you are searching on will have a higher…

lucene cosine-similarity

asked Mar 07 '12 at 21:42

Paul Taylor

13,411
42
184
351

Prev 1 2 3

…

66 67 Next