Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions

votes

1 answer

Calculating similarity based on attributes

My objective is to calculate the degree of similarity between two users based on their attributes. For instance let's consider a player and consider age, salary, and points as attributes. Also I want to place weight on each attribute by order of…

asked Nov 02 '16 at 14:53

user1010101

2,062
7
47
76

votes

3 answers

python glove similarity measure calculation

i am trying to understand how python-glove computes most-similar terms. Is it using cosine similarity? Example from python-glove github https://github.com/maciejkula/glove-python/tree/master/glove : I know that from gensim's word2vec, the…

python similarity cosine-similarity

asked Oct 31 '16 at 06:08

jxn

7,685
28
90
172

votes

2 answers

Cosine similarity using TFIDF

There are several questions on SO and the web describing how to take the cosine similarity between two strings, and even between two strings with TFIDF as weights. But the output of a function like scikit's linear_kernel confuses me a…

python tf-idf cosine-similarity

asked Apr 21 '16 at 12:36

David

1,454
3
16
27

votes

2 answers

How to run a large matrix for cosine similarity in Python?

I want to calculate cosine similarity between articles. And I am running into the problem that my implementation approach would take a long time for the size of the data that I am going to run. from scipy import spatial import numpy as np from…

python numpy scikit-learn cosine-similarity

asked Jan 20 '16 at 03:07

YAL

votes

1 answer

cosine distance between two matrices

Take two matrices, arr1, arr2 of size mxn and pxn respectively. I'm trying to find the cosine distance of their respected rows as a mxp matrix. Essentially I want to take the the pairwise dot product of the rows, then divide by the outer product of…

python numpy cosine-similarity

asked Oct 16 '15 at 01:05

Kevin Johnson

votes

2 answers

To find cosine similarity between two string(names)

I am using python and scikit-learn to find the cosine similarity between two strings(specifically, names).The program is able to find the similarity score between two strings but, when strings are abbreviated, it shows some undesirable output. e.g-…

python machine-learning scikit-learn cosine-similarity

asked Sep 09 '15 at 11:50

Narendra Rawat

votes

1 answer

Spark Cosine Similarity (DIMSUM algorithm ) sparse input file

I was wondering whether it would be possible for Spark Cosine Similarity to work with Sparse input data? I have seen examples wherein the input consists of lines of space-separated features of the form: id feat1 feat2 feat3 ... but I have an…

apache-spark sparse-matrix cosine-similarity

asked May 05 '15 at 18:08

anonuser0428

11,789
22
63
86

votes

0 answers

Sparse vector dot product with mongo aggregate

I am having documents with an attached sparse vector like this: { "_id" : ObjectId "vec" : [ { "dim" : 1, "weight" : 8 }, { "dim" : 3, "weight" : 3 } ] } I am trying to get the normalised dot…

mongodb cosine-similarity

asked Feb 07 '15 at 08:34

Hans Mündelein

votes

1 answer

some questions on cosine similarity

Yesterday I learnt that the cosine similarity, defined as can effectively measure how similar two vectors are. I find that the definition here uses the L2-norm to normalize the dot product of A and B, what I am interested in is that why not use the…

cluster-analysis distance data-mining similarity cosine-similarity

asked Aug 22 '14 at 03:27

John Smith

votes

1 answer

tm.package: findAssocs vs Cosine

I'm new here and my questions is of mathematical rather than programming nature where I would like to get a second opinion on whether my approach makes sense. I was trying to find associations between words in my corpus using the function…

r math text-mining tm cosine-similarity

asked Jan 25 '14 at 23:34

IVR

1,718
2
23
41

votes

1 answer

Cosine Similarity PHP

I want to calculate the cosine similarity between 1 (ID1) and 3 (ID1) in PHP, similarly for 1 and 4, 3 and 4. formula would be something like this: similarity = (1.1 * 3.1 + 1.4 * 3.4)/(((1.1)^2+(1.3)^2+(1.4)^2)^0.5)(((3.1)^2+ (3.4)^2)^0.5) =…

php cosine-similarity

asked May 28 '13 at 23:43

user2044770

votes

1 answer

how can I implement the tf-idf and cosine similarity in Lucene?

How can I implement the tf-idf and cosine similarity in Lucene? I'm using Lucene 4.2. The program that I've created does not use tf-idf and Cosine similaryty, it only uses TopScoreDocCollector. import com.mysql.jdbc.Statement; import…

java lucene tf-idf cosine-similarity

asked Apr 24 '13 at 22:12

Tia Chandrawati

votes

1 answer

Interpretation of cosine similarity and jaccard similarity (similarity of histograms)

Introduction I would like to assess the similarity between two "bin counts" arrays (related to two histograms), by using the Matlab "pdist2" function: % Input bin_counts_a = [689 430 311 135 66 67 99 23 37 19 8 4 …

matlab histogram similarity cosine-similarity pdist

asked Jun 26 '23 at 13:56

limone

votes

1 answer

How can I find the cosine similarity between two song lyrics represented as strings?

My friends and I are doing an NLP project on song recommendation. Context: We originally planned on giving the model a recommended song playlist that has the most similar lyrics based on the random input corpus(from the literature etc), however we…

nlp stanford-nlp bert-language-model cosine-similarity nlp-question-answering

asked May 06 '23 at 13:43

yyy818

votes

0 answers

How do I group words that are in a list based on cosine similarity?

I have a list of strings: my_list = ['policeman', 'police officers', 'police force',..].The length of list is around 2000. I want to group those words based on cosine similarity. If the cosine similairty is above 0.7, i want to group them together.…

python-3.x list grouping cosine-similarity

asked Apr 15 '23 at 11:11

Fio

Prev 1 2 3

…

66 67 Next