Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions
-1
votes
1 answer

What is the fastest method of efficiently calculating cosine similarity of one vector to many in .NET?

Below is the code I'm using currently. I'm comparing vector consisting of 768 floats against 50k others, and it takes about 800ms. I'm assuming that there's a much faster implementation, either in C# or perhaps some package that I can use that does…
UnionP
  • 1,251
  • 13
  • 26
-1
votes
1 answer

Cosine similarity and cosine distance formulas relation

Can someone explain these two formulas? Do they have any relationship? def _cosine_distance(a, b, data_is_normalized=False): if not data_is_normalized: a = np.asarray(a) / np.linalg.norm(a, axis=1, keepdims=True) b =…
-1
votes
2 answers

Determining cosine similarity for large datasets

I am currently using a dataset of over 2.5 million images, of which I use the image itself as a comparison to eachother, for use in a content-based recommendation engine. I use the following code to calculate the cosine similarity using some…
-1
votes
1 answer

How do I classify text using cosine similarity?

I have got a typical sentiment analysis task, my dataset consists of text and 3 classes (negative, neutral, positive). I have vectorized text using Bert sentence transformers and calculated the cosine similarity metric of my test_embeddings: output…
-1
votes
1 answer

How to create similarity matrix between words using w2v

I created training data using word2vec. I used 'wv.similarity' to find the cosine similarity between word1 and word2. I want to find the cosine similarity between all words(like a correlation table) in a list, but I don't know how. [word1, word2,…
-1
votes
1 answer

SVM using cosine kernel - dataset with images of dogs and cats

Hello I am trying to implement SVM by using cosine kernel but I can't undertand how I can do this.. What i thought it was the following, but i think its wrong svmCosine = cosine_similarity(train_X, train_y) svmCosine.fit(train_X, train_y) Could…
gma
  • 1
-1
votes
1 answer

how to get the top k similar items given item vectors in spark dataframe?

I get spark dataframe like bellow, result is the id's vector: +--------------------+--------------------+ | id | result| +--------------------+--------------------+ |000ab862128e11eab...|[-0.46, 0.31, 0.2] |…
-1
votes
1 answer

cosine_sim between a text and a single column in a dataset

i have a dataset that i have to do lemmarization for it which i did below then i have to find similarity between 1 column "text " with the word " vaccine is deadly" but not sure how to use the cosine similarity function right i tried putting the…
-1
votes
1 answer

cosine similarity preprocesing task

I have recently started with NLP. As part of cosine similarities calculation I have to complete the following task: # Convert the sentences into bag-of-words vectors. sent_1 = dictionary.doc2bow(sent_1) sent_2 = dictionary.doc2bow(sent_2) sent_3 =…
Ley
  • 67
  • 6
-1
votes
1 answer

Compare a list with the rows in pandas using Cosine similarity and get the rank

I have a Pandas Dataframe and a user input , i would require to compare the user input with each of the rows in the dataframe and get the Ranked list of rows in the dataframe based on Cosine Similarties. Department Country Age Grade Score Math …
pyds_learner
  • 509
  • 4
  • 16
-1
votes
1 answer

deduplicate removal from cosine similarity matrix pandas data frame

I removing duplicates from large string input, i created cosine similarity matrix as given below. 0 1 2 3 4 0 1.000000 0.515303 0.741283 0.035133 0.076743 1 0.920776 1.000000 0.153878 0.024261 …
-1
votes
1 answer

How to compute a cosine similarity in a matrix?

My original data is pretty large. It is about: data = [[0, 0, 0, ......0] [0, 0.124, 0, ..0] . . . [0, 0, 0, 0, 0.174]] data2 = [[0, 0, 0, ......0] [0, 0.74, 0, ..,0] . . . [0, 0, 0.15, 0,…
賴韋安
  • 53
  • 4
-1
votes
1 answer

Text Similarity - Cosine - Control

I would like to ask you, if anybody could check my code, because it was behaving weird - not working, giving me errors to suddenly working without changing anything - the code will be at the bottom. Background: So my goal is to calculate text…
-1
votes
1 answer

Location coordinates representation

What is the best way to represent longitude and latitude when calculating the similarities between items? Basically, I'm trying to do cosine similarity between multiple items. In addition to the typical features and metadata, I want to include the…
-1
votes
1 answer

Finding cosine similarity in R

I have two csv files with characters running up to 50000 variables in a first column of these two files. I have to calculate cosine similarity between these columns of two files. I have tried to use LSA in R. But some problem with my result. Can any…