Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions
0
votes
1 answer

Different document length in computing cosine similarity?

Is there any rule, when I like to find cosine similarity between two documents that have different number of words?
vikifor
  • 3,426
  • 4
  • 45
  • 75
0
votes
0 answers

How to calculate similarity matrix of a very large dataset by dividing the data to different pathces in matlab?

I have a matrix of 367800x84 that rows are instances and columns are dimensions. I try to calculate a similarity matrix however it does not fit into memory if I try to calculate from whole matrix. I tried different codes but did not work. I think to…
erogol
  • 13,156
  • 33
  • 101
  • 155
0
votes
2 answers

How to obtain complexity cosine similarity in Matlab?

I have implemented cosine similarity in Matlab like this. In fact, I have a two-dimensional 50-by-50 matrix. To obtain a cosine should I compare items in a line by line form. for j = 1:50 x = dat(j,:); for i = j+1:50 y = dat(i,:); …
sima412
  • 255
  • 2
  • 7
  • 16
0
votes
3 answers

Cosine similarity result above one

I am coding cosine similarity in PHP. Sometimes the formula gives a result above one. In order to derive a degree from this number using inverse cos, it needs to be between 1 and 0. I know that I don't need a degree, as the closer it is to 1, the…
samiles
  • 3,768
  • 12
  • 44
  • 71
0
votes
1 answer

getTermFrequencyVector in lucene

I am getting to know how lucene function getTermFreqVector() works while computing the cosine theta similarity distance betweeen two documents. Can anyone shed some light on what the does "field-name" mean in getTermFreqVector(doc number,…
0
votes
1 answer

What are the pre-processing requirements on cosine similarity?

The input on cosine similarity is two vectors representing two different data i want to compare. Is there a requirement for the semantic of the vector? Can it simply be the byte representation of each file. And then compute the frequency of each…
curious
  • 1,524
  • 6
  • 21
  • 45
0
votes
1 answer

Unexpected/undefined results when using Maps in Java

I'm doing some work trying to recommend documents, and to do so I am using the Cosine Similarity method. Here is the code for that method: static double cosineSimilarity(HashMap v1, HashMap v2) { Set both…
jk47
  • 755
  • 4
  • 10
0
votes
1 answer

Can I normalize cosine-similarities?

Is there a way to convert a list of cosine similarities to percentage? I tried to wrap my brain around this but I'm in great doubt. Would it make sense to normalize the cosine values of the four documents like so: Doc #1 0.9600 Doc #2 0.9300 Doc…
Simon Paarlberg
  • 277
  • 2
  • 10
0
votes
2 answers

Recommendation system - using different metrics

I'm looking to implement an item-based news recommendation system. There are several ways I want to track a user's interest in a news item; they include: rating (1-5), favorite, click-through, and time spent on news item. My question: what are some…
0
votes
1 answer

Machine learning what approach to use when the dataset contain only one-class instances?

I have a dataset of a particular domain (say sports - 1 class). What I want to do is when I fed a web page to the classifier/clusterer I want to get a result whether that instance (web page) is related to sports or not. Most of the classifiers in…
0
votes
1 answer

Fast Calculation of Pairwise Cosine Directional Distance Between Points in a (n x d x t) matrix

I am aware of the pdist(X,distance) in Matlab to take an (nxd) matrix of points and calculate the pairwise distances between them. I am also aware that it has an extra option to calculate the cosine distance if a matrix contain vectors rather than…
oracle3001
  • 1,090
  • 19
  • 31
-1
votes
0 answers

efficient C implementation to do pairwise evaluations for given data

I'm working on machine learning problems where we need to evaluate pairwise interactions between data points a lot of times. Namely, given arbitrary data set X (m points in k dimensions) and Y (n points in k dimensions), we need to compute the…
booksee
  • 389
  • 1
  • 3
  • 16
-1
votes
0 answers

How to Detect Similar Sentences from Different Dataframes?

Let's say I have these 2 pandas dataframes: df_jkt business name address zap clinic kemang south jakarta natasha beauty clinic ciracas east jakarta erha apothecary tebet south jakarta dr viona spkk west jakarta df_tng business…
-1
votes
1 answer

Cosine Similarity > 1 in dlib face recognition

Testing face recognition using dlib in VS Code. In this code, Treating the faces as the same if the Euclidean distance is less than 0.6, I've written the following code to get the Cosine Similarity here, and it gives me a Cosine Similarity of more…
Luke
  • 11
  • 2
-1
votes
0 answers

Why do we using cosine similarity in contrastive loss or while comparing query and keys in transformer?

So I was wondering about the problems cosine similarity can have while comparing two vectors. For example if you consider the figure attached, here if we consider vector a to be anchor point, then the distance(a,b) and distance(a,c) is same yet the…