Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions
-2
votes
3 answers

Is there a way to vectorize only words i.e not from a corpus or bag of words in python?

My use case is to vectorize words in two lists like below. ListA = [Japan, Electronics, Manufacturing, Science] ListB = [China, Electronics, AI, Software, Science] I understand that word2vec and Glove can vectorize words but they do that through…
Ridhima Kumar
  • 151
  • 3
  • 14
-2
votes
1 answer

How to find similarity score between two PDFs stored in HDFS

I have PDFs stored in Hadoop HDFS as unstructured data. I want to find if two PDFs are similar or not and what is the similarity and dissimilarity of these two PDFs. I am new to this, so it will be very helpful if you can help me with code and its…
-2
votes
1 answer

Using known python packages for implementing N-Gram, TF-IDF and Cosine similarity

I'm trying to implement a similarity function using N-Grams TF-IDF Cosine Similaity Example Concept: words = [...] word = '...' similarity = predict(words,word) def predict(words,word): words_ngrams = create_ngrams(words,range=(2,4)) …
Sahar Millis
  • 801
  • 2
  • 13
  • 21
-2
votes
1 answer

Can cosine similarity be objective function for deep learning?

I want to train an output vector(which is from deep learning model) like fixed vector. Hence, I chose a cosine similarity between two vectors as the objective function. However, I don't know if that is a correct approach for my need.
-2
votes
1 answer

why two vectors is not similarity but result is 1?

I'm using Cosine Similarity formula to caculate similarity between two vectors. I tried two different vectors like this: Vector1(-1237373741, 27, 1, 1, 331289590, 1818540802) Vector2(-1237373741, 49, 1, 1, 331289590, 1818540802) Two vectors has a…
-2
votes
1 answer

can someone show me how to work out simple cosine similarity graphically

Can someone shoe me how to work out cosine similarity please? I understand that someone has answered a similar question beforesimilar question link but i do not understand how the end result was reached.
user3412172
  • 489
  • 2
  • 6
  • 10
-2
votes
1 answer

Error Law of cosines java

I need to compute the law of cosines for the given values as shown above. I tested each one of the values to see if the correct computations are made for the parts of the equation. I need to find the cosine for the given angle "a" and in the…
ecke
  • 11
  • 1
  • 1
  • 4
-2
votes
1 answer

Python: finding score similarity between users within a cluster

How can I calculate similarity between user and score? For example, df: user score category_cluster i 4.5 category1 j 5 category1 k 9.5 category2 I want to have a result like: similarity between…
-2
votes
1 answer

Find similar items in a dataset

I have a dataset of of 500 mobile devices having 10 attributes namely Date|Company|ModelName|Price|HardDisk|RAM|Colour|Display size|Cam1|Cam2 The sample dataset is given below : 24/10/2015 | walmart | Samsung Galaxy Note 4 N910H 32GB…
Akshat Kumar
  • 69
  • 3
  • 13
-2
votes
1 answer

Retrieving top k similar rows in a matrix for each row via cosine similarity in R

How to efficiently retrieve top K-similar vectors by cosine similarity using R? asks how to calculate top similar vectors for each vector of one matrix, relative to another matrix. It's satisfactorily answered, and I'd like to tweak it to operate on…
Max Ghenis
  • 14,783
  • 16
  • 84
  • 132
-2
votes
1 answer

Python - Cosine Similarity with Nested Dictionary Structure

I'm trying to perform a cosine similarity of the vector of food amounts between various students. I have a CSV file that contains: Student food amount John apple 15 John banana 20 John orange 1 John grape …
user3330107
  • 71
  • 1
  • 6
-3
votes
1 answer

What's the best algorithm to check the similarity percentage among the submitted assignments?

I am planning to build a project for final year that is similar to similarity checker. In the project, I am planning to check the similarity percentage among the submitted assignments i.e offline. For example: When the first student submits an…
John
  • 972
  • 1
  • 7
  • 24
-3
votes
1 answer

how to make binary vector for cosine in r

I have two vectors of different size having different values. v1=c("3423","3221","65892","8033") v2=c("3423","3221","9923") According to these two vectors, I have following set of values. {"3423","3221","65892","8033","9923"} Now I want to…
Alvi
  • 123
  • 1
  • 3
  • 14
1 2 3
66
67