Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions
0
votes
1 answer

Type Error when Comparing Two Dictionaries Using Cosine Similarity in Python

I have received a type error while comparing two dictionaries using the cosine similarity. I have tried to search around but still not able to solve it, and would really appreciate if anyone could shed some light for me. My dictionaries look as like…
Yoshiaki
  • 125
  • 7
0
votes
4 answers

Cosine-similarity performance in Java 15 times slower than equivalent C?

I have two functions, each of which calculates the cosine similarity of two different vectors. One is written in Java, and one in C. In both cases I am declaring two 200 element arrays inline, and then calculating their cosine similarity 1 million…
Scott Klarenbach
  • 37,171
  • 15
  • 62
  • 91
0
votes
0 answers

convert cosine similarity to their respective strings

I am doing a clustering based on cosine similarity of many string sentences. for example the cosine similarity of string a & string b is close to string c & string b. The clustering method will group them together in a list form but the values that…
0
votes
1 answer

calculating cosine similarity using MapReduce

I am trying making a item-based recommendation using cosine similarity with MapReduce. Here's the input set. itemIdx_1, userIdx_1 itemIdx_1, userIdx_2 itemIdx_2, userIdx_1 itemIdx_3, userIdx_3 ... How do I design with this input data? To use…
Hoon
  • 1,571
  • 5
  • 15
  • 19
0
votes
1 answer

Cosine Similarity of Eigen Vectors of two different matrices

Is it a valid measure, to find the cosine similarity of the Eigen vectors of two very large matrices, to compare how similar they are? I have two very large matrices A and B. I found: -> Co-Variance matrices CA and CB, -> Top 20 Eigen vectors of CA…
user2761431
  • 925
  • 2
  • 11
  • 26
0
votes
1 answer

Information retrieval, inverted index issue

Hi i'm trying to write a little program that indexes some documents from an xml collection. I use the tf-idf method. Now when my program reads the query it returns a list of tuples ('tf-idf','docid') for each word in each document. This is an…
0
votes
1 answer

Better understanding of cosine similarity

I am doing a little research on text mining and data mining. I need more help in understanding cosine similarity. I have read about it and notice that all of the given examples on the internet is using tf-idf before computing it through…
user3809384
  • 111
  • 1
  • 4
  • 10
0
votes
1 answer

Is this an approach to user-item recommendations that could work

I am designing an application that incorporates a recommendation system base on user interactions (collaborative filtering). The user on his homepage is presented a set of 6 items to interact with. There will be between 50 and 300 items. The…
0
votes
1 answer

USING TFIDF FOR RELATIVE FREQUENCY, COSINE SIMILARITY

I'm trying to use TFIDF for relative frequency to calculate cosine distance. I've selected 10 words from one document say: File 1 and selected another 10 files from my folder, using the 10 words and their frequency to check which of the 10 files are…
user2100552
0
votes
1 answer

Best way to correlation coefficient foe nominal data similarity

I hope someone can help me on this one (PLEASE) : I want to do similarity between some article features ( author, category, year, impact factor , citation) And I dont have a clue how to do it for the nominal data , for the numerical features I can…
0
votes
1 answer

mapreduce way to calculate user similarity matrix

I have a list of many users (over 10 million) each of which is represented by a userid followed by 10 floating-point numbers indicating their preference. I would like to efficiently calculate the user similarity matrix using cosine similarity based…
Yang
  • 6,682
  • 20
  • 64
  • 96
0
votes
1 answer

Best way to find document similarity

I'm new to NLP, i want to find the similarity between the two documents I googled and found that there are some ways to do it e.g. Shingling, and find text resemblance Cosine similarity or lucene tf-idf What is the best way to do this(I'm open…
Imran
  • 5,376
  • 2
  • 26
  • 45
0
votes
0 answers

Clustering, but with conditions (in R)

I am doing some clustering of documents using cosine similarity between each document. This is fine. However my problem is a little strange in that I only want to cluster certain documents with others, not all of the documents against each other. …
user2680293
  • 79
  • 1
  • 5
0
votes
1 answer

Add stop_words while performing TF-IFcosine similarity

I'm using sklearn to perform cosine similarity. Is there a way to consider all the words starting with a capital letter as stop words?
DJJ
  • 2,481
  • 2
  • 28
  • 53
0
votes
2 answers

Measuring distance between vectors

I have a set of 300.000 or so vectors which I would like to compare in some way, and given one vector I want to be able to find the closest vector I have thought of three methods. Simple Euclidian distance Cosine similarity Use a kernel (for…
halfdanr
  • 373
  • 4
  • 11