Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions

votes

2 answers

What is the most efficient way to identify text similarity between items in large lists of strings in Python?

The following piece of code achieves the results I'm trying to achieve. There is a list of strings called 'lemmas' that contains the accepted forms of a specific class of words. The other list, called 'forms' contains a lot of spelling variations of…

asked Apr 05 '23 at 20:54

jfontana

votes

0 answers

How to get the most similar match using BERT from a pandas column to an input string?

I am trying to find the most similar match in a column of a pandas dataframe to an input string that is not in English (Swedish). This is what I have tried. I have encoded both my input string and the texts in the pandas' column and then I tried to…

python-3.x pandas bert-language-model cosine-similarity

asked Feb 08 '23 at 14:47

Vai

votes

1 answer

Calculate Distance Metric between Homomorphic Encrypted Vectors

Is there a way to calculate a distance metric (euclidean or cosine similarity or manhattan) between two homomorphically encrypted vectors? Specifically, I'm looking to generate embeddings of documents (using a transformer), homomorphically…

python embedding cosine-similarity euclidean-distance homomorphic-encryption

asked Nov 11 '22 at 01:12

Brian Behe

votes

3 answers

Huggingface Transformers FAISS index scores

Huggingface transformers library has a pretty awesome feature: it can create a FAISS index on embeddings dataset which allows searching for the nearest neighbors. train_ds['train'].add_faiss_index("embedding") scores, sample =…

huggingface-transformers cosine-similarity faiss

asked Aug 08 '22 at 20:01

Nik

votes

1 answer

Getting similarity score with spacy and a transformer model

I've been using the spacy en_core_web_lg and wanted to try out en_core_web_trf (transformer model) but having some trouble wrapping my head around the difference in the model/pipeline usage. My use case looks like the following: import spacy from…

nlp cosine-similarity spacy-3 spacy-transformers

asked May 31 '22 at 21:49

Connor

votes

1 answer

What is the equivalent of python's faiss.normalize_L2() in C++?

I want to perfom similarity search using FAISS for 100k facial embeddings in C++. For the distance calculator I would like to use cosine similarity. For this purpose, I choose faiss::IndexFlatIP .But according to the documentation we need to…

c++ face-recognition cosine-similarity faiss

asked Jan 31 '22 at 10:12

Sabbir Talukdar

votes

1 answer

Python compute cosine similarity on two directories of files

I have two directories of files. One contains human-transcribed files and the other contains IBM Watson transcribed files. Both directories have the same number of files, and both were transcribed from the same telephony recordings. I'm computing…

python nlp spacy cosine-similarity

asked Oct 20 '21 at 21:20

jtoepp

votes

1 answer

Cosine distance more than 1

I'm using the distance.cosine function from the scipy.spatial python package. The problem is that my code returns me some values which are more than one. How is that possible? My code is very simple but that's it: for i in…

python scipy cosine-similarity

asked Jun 26 '21 at 13:31

Barbamento

votes

2 answers

Top N Values of Cosine Similarity Matrix in R

How do I get the top pairs of a cosine similarity matrix like below: southpark_matrix <- structure(c(0, 0.165272735625452, 0.386480286121192, 0.170696960480773, 0.0869562860988618, 0.165272735625452, 0, 0.251690602341816, 0.472701602991984,…

r matrix cosine-similarity

asked Apr 04 '21 at 16:12

nak5120

4,089
4
35
94

votes

3 answers

Calculating words similarity score in python

I'm trying to calculate books similarity by comparing the topics lists. Need to get similarity score from the 2 lists between 0-1. Example: book1_topics = ["god", "bible", "book", "holy", "religion", "Christian"] book2_topics = ["god", "Christ",…

python nlp wordnet cosine-similarity sentence-similarity

asked Apr 02 '21 at 12:33

Sapir

votes

2 answers

Computing Cosine Distance with Differently shaped tensors

I have the following tensor representing a word vector A = (2, 500) Where the first dimension is the BATCH dimension (i.e. A contains two word vectors each with 500 elements) I also have the following tensor B = (10, 500) I want to compute the…

pytorch cosine-similarity

asked Feb 25 '21 at 19:11

Joe

votes

4 answers

How to find most optimal number of clusters with K-Means clustering in Python

I am new to clustering algorithms. I have a movie dataset with more than 200 movies and more than 100 users. All the users rated at least one movie. A value of 1 for good, 0 for bad and blank if the annotator has no choice. I want to cluster similar…

python cluster-analysis k-means euclidean-distance cosine-similarity

asked Feb 01 '21 at 10:33

ToBeEXP

votes

2 answers

create a function to compute all pairwise cosine similarity of the row vectors in a 2-D matrix using only numpy

For example, given matrix array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) it should return array([[1. , 0.91465912, 0.87845859], [0.91465912, 1. , 0.99663684], [0.87845859,…

python numpy cosine-similarity

asked Jan 11 '21 at 19:58

RRR

votes

0 answers

Text similarity as probability (between 0 and 1)

I have been trying to compute text similarity such that it'd be between 0 and 1, seen as a probability. The two text are encoded in two vectors, that are a bunch of numbers between [-1, 1]. So as two vectors are given, it seems plausible to use…

similarity cosine-similarity sentence-similarity

asked Nov 16 '20 at 02:15

inverted_index

2,329
21
40

votes

2 answers

About cosine similarity, how to choose the loss function and the network(I have two plans)

Sorry I have no clue, I don't know where to find a solution. I'm using two networks to construct two embeddings，I have binary target to indicate whether embeddingA and embeddingB "match" or not(1 or -1). The dataset like this: embA0 embB0 1.0 embA1…

python neural-network pytorch embedding cosine-similarity

asked Sep 05 '20 at 03:41

island145287

Prev 1 2 3

…

66 67 Next