Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions

votes

1 answer

Fastest way to compute cosine similarity in a GPU

So I have a huge tfidf matrix with more than a million records, I would like to find the cosine similarity of this matrix with itself. I am using colab to run the code, but I am not sure how to best make use of the gpu provided by…

asked Jun 11 '20 at 05:03

kb hithesh

votes

2 answers

How can I get the cosine similarity of all elements of an array with all the other elements in the same array using Tensorflow

Given an array of sentence embeddings (arrays of 512) with a shape of (1000000, 512) how do I calculate the cosine similarity of every one of the 1 million sentence embeddings of the array against every other sentence embedding of the array, ideally…

tensorflow cosine-similarity

asked Jun 05 '20 at 07:45

jdoig

1,472
13
27

votes

1 answer

How to measure how distinct a document is based on predefined linguistic categories?

I have 3 categories of words that correspond to different types of psychological drives (need-for-power, need-for-achievement, and need-for-affiliation). Currently, for every document in my sample (n=100,000), I am using a tool to count the number…

nlp data-science topic-modeling cosine-similarity word-embedding

asked May 27 '20 at 08:07

SanMelkote

votes

1 answer

cosine_similarity between 2 pandas df column to get cosine distance

I have a dataframe as shown below: vector_a vector_b [1,2,3] [2,5,6] [0,2,1] [2,9,1] [4,7,1] [1,7,4] I would like to do sklearn's cosine_similarity between the columns vector_a and vector_b to get a…

python pandas scikit-learn cosine-similarity

asked Dec 31 '19 at 23:32

atjw94

votes

0 answers

Getting a 1 x N similarity matrix instead of N x N one using Count Vectorizer

So I'm trying to create similarity matrix of huge dataset whose dimension becomes 60000 x 60000 which is not possible to be stored in the even 25gb ram so I wanted to create the similarity scores separately with the dimension 1 x 60000 where i get…

python-3.x scikit-learn nlp cosine-similarity countvectorizer

asked Dec 06 '19 at 13:42

Yaboku

votes

2 answers

Bert fine-tuned for semantic similarity

I would like to apply fine-tuning Bert to calculate semantic similarity between sentences. I search a lot websites, but I almost not found downstream about this. I just found STS benchmark. I wonder if I can use STS benchmark dataset to train a…

nlp cosine-similarity pearson-correlation sentence-similarity

asked Dec 04 '19 at 09:18

Chad

votes

0 answers

Which pyspark abstraction is appropriate for my large matrix multiplication?

I want to perform a large matrix multiplication C = A * B.T and then filter C by applying a stringent threshold, collecting a list of the form (row index, column index, value). A and B are sparse, with mostly zero entries. They are initially…

python apache-spark pyspark sparse-matrix cosine-similarity

asked May 24 '19 at 18:08

brch

votes

1 answer

How cosine similarity differs from Okapi BM25?

I'm conducting a research using elasticsearch. I was planning to use cosine similarity but I noted that it is unavailable and instead we have BM25 as default scoring function. Is there a reason for that? Is cosine similarity improper for querying…

elasticsearch nlp information-retrieval cosine-similarity

asked Mar 15 '19 at 01:32

Daniel Peixoto

votes

2 answers

Cosine Similarity between Lists of Sentences using Doc2Vec

I'm new to NLP but I'm trying to match a list of sentences to another list of sentences in Python based on their semantic similarity. For example, list1 = ['what they ate for lunch', 'height in inches', 'subjectid'] list2 = ['food eaten two days…

python-3.x nlp data-science cosine-similarity doc2vec

asked Mar 08 '19 at 16:40

m13op22

2,168
2
16
35

votes

2 answers

How to find pairs of values greater than a certain cosine distance value?

I have an array: [[ 0.32730174 -0.1436172 -0.3355202 -0.2982458 ] [ 0.50490916 -0.33826587 0.4315952 0.4850834 ] [-0.18594801 -0.06028342 -0.24817085 -0.41029227] [-0.22551994 0.47151482 -0.39798814 -0.14978702] [-0.3315491 0.05832376…

python cosine-similarity pdist

asked Dec 06 '18 at 15:29

M. ahmed

votes

1 answer

Does Euclidean Distance measure the semantic similarity?

I want to measure the similarity between sentences. Can I use sklearn and Euclidean Distance to measure the semantic similarity between sentences. I read about Cosine similarity also. Can someone explain the difference of those to measures and what…

scikit-learn gensim euclidean-distance cosine-similarity sentence-similarity

asked Nov 11 '18 at 08:57

jenyK

votes

1 answer

Add exception in Spacy tokenizer to not break the tokens with whitespaces?

I am trying to find word similarity between a list of 5 words and a list of 3500 words. The problem that I am facing: The List of 5 words I have are as below …

python-3.x nlp spacy cosine-similarity word-embedding

asked Nov 03 '18 at 07:59

venkatttaknev

votes

1 answer

Python speed up document similarity calculation of corpus

My input is a string in this (spintax) format, "The {PC|Personal Computer|Desktop} is in {good|great|fine|excellent} condition" Then using itertools, I generate all possible combinations. e.g. "The PC is in good condition" "The PC is in great…

python cosine-similarity

asked Oct 23 '18 at 10:24

Mujeeb

votes

2 answers

PostgreSQL: perform cosine similarity search over pre-vectorized database

I'm trying to implement the cosine similarity search on pre-vectorized database table (like trigram similarity), having objects in this structure: from django.contrib.postgres.fields import ArrayField from django.db import models class…

sql django postgresql search cosine-similarity

asked Sep 20 '18 at 16:57

ShellRox

2,532
6
42
90

votes

1 answer

fastest way to perform cosine similarity for 10 million pairs of 1x20 vectors

I have a pandas df of 2 columns each containing 2.7 million rows of normalized vectors of length 20. I want to take the cosine sim of column1 - row1 vs column2- row1, column1 - row2 vs column2 - row2... so and and so forth until 2.7 million. I have…

python pandas numpy cosine-similarity

asked Jul 15 '18 at 23:43

Federico Marchese

Prev 1 2 3

…

66 67 Next