Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions

votes

1 answer

Cosine similarity for already known pairs of duplicates

I have a list of duplicate document pairs saved in a csv file. Each ID from column 1 is a duplicate to the corresponding ID in column 2. The file goes something like this: Document_ID1 Document_ID2 12345 87565 34546 …

asked Apr 20 '17 at 17:53

Minu

votes

1 answer

Why cosine_similarity of pretrained fasttex model is high between two sentents are not relative at all?

I am wondering to know why pre-trained 'fasttext model' with wiki(Korean) seems not to work well! :( model = fasttext.load_model("./fasttext/wiki.ko.bin") model.cosine_similarity("테스트 테스트 이건 테스트 문장", "지금 아무 관계 없는 글 정말로 정말로") (in…

word2vec cosine-similarity doc2vec fasttext

asked Apr 18 '17 at 15:41

DSDS

votes

0 answers

Spark MLlib Scala - Creating Rowmatrix from MovieLens like DataSet

I am trying to implement cosine similarity to calculate Item-Item Similairity using Input Dataset which looks like this - UserID, ProductID, Transactions where UserID, ProductID are Long values and Transaction is Integer. I am following this…

scala apache-spark apache-spark-mllib cosine-similarity mahout-recommender

asked Apr 18 '17 at 04:05

saurzcode

votes

0 answers

Cosine similarity robust to shifts

Is there a generalization of cosine similarity that is robust to shifts across the compared vectors? E.g. a metric assigning high similarity to the following vectors: [0,1,1,1,2,2,0,0] [1,1,1,2,2,0,0,0]

distance cosine-similarity

asked Apr 03 '17 at 17:17

Dion

votes

0 answers

Similar Users in MovieLens Data

I am trying to find the similar users in Movie Lens data using numpy in python so that all calculations are fast. However, I am not able to get the final code to find similarity using matrices mulplications etc. Here is the code: import pandas as…

python python-2.7 numpy machine-learning cosine-similarity

asked Mar 29 '17 at 02:54

Manish Kumar

1,419
3
17
36

votes

1 answer

Cosine similarity between any two sentences is giving 0.99 always

I downloaded the stackoverflow dump (which is a 10GB file) and ran word2vec on the dump in order to get vector representations for programming terms (I require it for a project that I'm doing). Following is the code: from gensim.models import…

word2vec cosine-similarity

asked Mar 21 '17 at 13:31

morghulis

votes

0 answers

how to compute cosine similarity between words for a large DocumentTermMatrix

I have a large tdm, for which I need the cosine similarity for every term with every other term. Standard procedures are not helping as I am getting the following error. Error: cannot allocate vector of size 1162.4 Gb Since I am a novice with…

r parallel-processing tm cosine-similarity

asked Mar 13 '17 at 15:32

NinjaR

votes

1 answer

Write custom kernel for svm in R

I'm looking to use the svm() function of the e1071 package in R. I am new to this package and I was wondering if it is possible to write your own custom kernel callable in svm(). I see that there are several kernels pre-loaded, but I don't see a…

r machine-learning svm cosine-similarity

asked Mar 06 '17 at 18:47

user162381

votes

1 answer

How to apply content based filtering in ne04j

I have a data in below format where 1st column represents the products node, all the following columns represent properties of the products. I want to apply content based filtering algo using cosine similarity in Neo4j. For that, I believe, I need…

neo4j cosine-similarity

asked Mar 02 '17 at 12:34

Amar jaiswal

votes

1 answer

Similarity Metrics

I am trying to research on different metrics and found many ssimilarity metrics : Euclidean distance Dynamic Time Warping, Edit Distance with Real Penalty DISSIM , Sequence Weighted Alignment model, Spatial Assembling Distance. However I had a…

machine-learning computer-vision euclidean-distance cosine-similarity

asked Mar 01 '17 at 16:46

user2359877

votes

1 answer

Calculating cosine similarity from file vectors in Python

I would like to calculate cosine similarity between two vectors in the a file in the following format: first_vector 1 2 3 second_vector 1 3 5 ... simply the name of the vector and then its elements, separated by single space. I have defined a…

python list file-io cosine-similarity

asked Feb 08 '17 at 22:37

Programmer

votes

1 answer

How does cosine similarity used with K-means algorithm?

For three text document vectors having different length in their vectors in VSM where entries are tf-idf of terms: Q1: how cosine similarity used by k-means does then how the clusters are constructed. Q2: when I use TF-IDF algo. Its produce a…

algorithm cluster-analysis k-means cosine-similarity trigonometry

asked Feb 07 '17 at 17:09

wilyam walass

votes

1 answer

Long running spark submit job

I am trying to run a script using spark submit as this spark-submit -v \ --master yarn \ --num-executors 80 \ --driver-memory 10g \ --executor-memory 10g \ --executor-cores 5 \ --class cosineSimillarity jobs-1.0.jar This script is implementing…

scala apache-spark cosine-similarity spark-submit

asked Feb 01 '17 at 23:06

MasterGoGo

votes

1 answer

Computing cosine similarity using Python

I have written the following code to compute the cosine similarity between a number of preprocessed document (stop word removal, stemming and term frequency-inverse document frequency). print(X.shape) similarity = [] for each in X: …

text machine-learning scikit-learn cosine-similarity

asked Feb 01 '17 at 01:39

user7347576

votes

1 answer

Tf-Idf calculation for two corpuses

I have two corpuses (Corpus 1 & Corpus 2), documents in corpus 1 contain plagiarized sentences from Corpus 2. I'm using Tf-Idf approach to measure the similarity between documents in corpus 1 against docs in Corpus 2. An inverted index for terms in…

java tf-idf cosine-similarity inverted-index

asked Jan 15 '17 at 20:44

Minions

5,104
5
50
91

Prev 1 2 3

…

66 67 Next