Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions
0
votes
2 answers

Compile error in my parser, think I have my input file wrong, but unsure what did wrong

So basically this is a parser/cosine matrix calculator, but I keep getting compile error. I think I have the path for my input of reading the text file right. But it still won't compile. This is my main class: import…
Jin W
  • 1
  • 5
0
votes
1 answer

KNN with TF-IDF Throwing "Reshape your data" Warnings with Cosine Similarity as a distance metric

I am trying to do KNN using Cosine Similarity in SciKIt Learn but it keep throwing these warnings. Can someone explain what is the meaning of these and why is it only coming when I am trying to fit a KNN model with cosine similarity and not with any…
silent_dev
  • 1,566
  • 3
  • 20
  • 45
0
votes
1 answer

how to do text clustering from cosine similarity

I am using WEKA for performing text collection. Suppose i have n documents with text, i calculated TFID as feature vector for each document and than calculated cosine similarity between each of each of the document.it generated nXn matrix. Now i…
Nhqazi
  • 732
  • 3
  • 12
  • 30
0
votes
1 answer

Normalize cosine similarity values calculated based on tf-idf

I compute cosine similarity based tf-idf matrix : tfidf_vectorizer_desc = TfidfVectorizer(min_df=5, max_df=0.8, use_idf=True, smooth_idf=True, sublinear_tf=False, tokenizer=tokenize_and_stem) %time tfidf_matrix_desc =…
kitchenprinzessin
  • 1,023
  • 3
  • 14
  • 30
0
votes
1 answer

There are other useful similarity or distance metrics?

I'm developing an approximate computation system. Defining how much similar two objects are is a basic operation in such a system. Usually in computer science and math, similarity is synonym of distance between two objects, but it is not always…
0
votes
1 answer

Match the values of 2 arrays

I am trying to create a program that will rate a bunch of arrays to find the one that most closely matches a given array. So say the given array is [1, 80, 120, 155, 281, 301] And one of the array to compare against is [-6, 78, 108, 121, 157, 182,…
James Notaro
  • 137
  • 1
  • 3
  • 10
0
votes
0 answers

sklearn cosine_distances returns negative values

I have used the sklearn cosine_distances as below. Why does 'dist_desc' return negative values, even for the same objects, item0 vs item0 ? tfidf_vectorizer_desc = TfidfVectorizer(use_idf=True, tokenizer=tokenize_and_stem) tfidf_matrix_desc =…
kitchenprinzessin
  • 1,023
  • 3
  • 14
  • 30
0
votes
1 answer

Cosine similarity of each row in a matrix

I have a matrice named vectors[i][j].I would like to calculate cosine similarity between each row. For example for this matrice 1 0 1 0 1 0 0 v= 0 0 1 1 1 0 1 1 1 0 0 1 0 1 I want to have similarity calculation ,between row1 and row 2 ,…
dpointttt
  • 23
  • 5
0
votes
1 answer

Retrieve top n rows based on cosine similarity of vectors in R

I am writing a function to retrieve the top n results from a list of words and their values using cosine similarity. I've included my data as follows, this is the first few entries of ~400k but it gives you an idea of the structure. the 0.41800 …
CS2016
  • 331
  • 1
  • 3
  • 15
0
votes
1 answer

How to compare PVectors with cosine rule

This is probably a really simple thing, but I am stumped. This is part of a much larger thing, so the code is just a snippet. Each of the green circles arranged around the Particle is a Gate. When the mouse moves around the Particle the closest Gate…
0
votes
1 answer

DeepLearning4J - ParagraphVectors: Why is similarity negative?

I'm using the ParagraphVector tool in DeepLearning4j framework. What I'm doing is training a model on a set of text documents and then calculating the similarity between those documents. Now, as the reference page…
0
votes
1 answer

How can we calculate adjusted cosine similarity for two items represented by their ratings?

I want to compute adjusted cosine similarity value for two items represented by a and b respectively. We take two vectors a={2,3,1,0} and b={1,0,4,2}. I know how cosine similarity work but I am stuck with adjusted cosine similarity approach.
0
votes
1 answer

Finding cosine similarity between dataframes using R

I have two data frames containing information from various hospitals. The first has number of probable cases of dengue and the second has number of comfirmed cases of dengues.The data is given weekly wise. I have data upto 53 weeks or 1 year. …
amankedia
  • 377
  • 2
  • 8
  • 23
0
votes
1 answer

Calculating the Angle Between Vectors by using a vector as a reference point:

I have been trying to find a fast algorithm of calculating all the angle between n vectors that are of length x. For example if x=3 and n=4, my data would look something like this: A: [1,2,3] B: [2,3,4] C: [...] D: [...] I was wondering is it…
nerdPollution
  • 45
  • 1
  • 1
  • 4
0
votes
1 answer

Use Cosine Similarity with Binary Data - Mahout

I have a boolean/binary where a customer and product id are found when the customer actually bought the product and not found if the customer did not buy it. The dataset represented like this: Dataset I have tried different approaches like…