Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions
0
votes
0 answers

Calculate distance from densest part of cosine similarity 2d distribution

Forgive me in advance if my terminology sounds a bit vague, but I am trying to explain my problem in plain English. Let's say I have 10 sets of documents and for each set I have calculated the cosine similarity matrix based on the term frequency…
CptNemo
  • 6,455
  • 16
  • 58
  • 107
0
votes
0 answers

How to automate a focused web crawler's evaluation (precision & recall)

There was a question about this, but the user was satisfied (probably?) with knowing about precision, recall and F1 score, so I'll extend it: To compute precision & recall, you need the TP, FN, TN and FP values. Out of the box, after a crawl, you…
clausavram
  • 546
  • 8
  • 14
0
votes
0 answers

Cosine similarity calculation issue

I am having issues in calculation of cosine similarity between 2 strings. I calculate the binary vector format of each string using a function. It gives out binary vectors which are in the form of, say, (1,1,1,1,1,0,0,0,0). public static…
Biju
  • 106
  • 1
  • 10
0
votes
1 answer

What's the best way to obtain cosine similarity from two vectors in MATLAB?

I'll need to repeat this process multiple times, and the number of values will vary from ~10 to ~1000. I don't have access to all the vectors at once - they'll become accessible to me two vectors at a time. In each instance there will always be the…
0
votes
1 answer

is it possible in content based recommendation

I was exploring about content based algorithm,so i learnt about that content based algorithms works on to calculate similarity between item and user like "pandora" is going on. So my requirement is that i have scale of hundred, for example user can…
Prabjot Singh
  • 4,491
  • 8
  • 31
  • 51
0
votes
1 answer

Quickly compare cosine similarity of query with documents in a corpus

I'm curious as to how companies generally compute the cosine similarity quickly among an entire corpus. As an example, if someone searched for the terms "funny cats", and there are 100,000 documents that have at least one of those terms, calculating…
Tim S
  • 5,023
  • 1
  • 34
  • 34
0
votes
1 answer

Cosines similarity on large data sets

Currently i'm studying about data-mining, text comparison and have found this one: https://en.wikipedia.org/wiki/Cosine_similarity. Since i have successfully implemented this algorithm to compare two strings i have decided to try some more complex…
0
votes
1 answer

Average in Adjusted cosine Similarity

what is the denominator in average rating of a user in Adjusted cosine similarity? (Item Based Collaborative Filtering) Is it number all Items in system?? Or Just number of rated items by user?? and is there a function in MatLab for Adjusted…
Anna
  • 13
  • 7
0
votes
0 answers

How do we ignore the order of letters in calculating Levenshtein distance?

This question is not new and i have seen some form of explanation here and here. Both methods described performing N grams (bigrams mostly) calculations on the terms of query 1 and query 2 and then finding the cosine similarity. I was hoping for a…
jxn
  • 7,685
  • 28
  • 90
  • 172
0
votes
0 answers

Both people rated a product with 0 star

If we have: User 1, rated product A with 0 star. User 2, rated product A with 0 star. What is the Pearson's correlation coefficient or Cosine Similarity between them? According to the formula, it should be 0/0. But what is 0/0? It is not a…
0
votes
1 answer

use values(features) in a vector to calculate cosine similarity for opencv

I am recently working on a project where in I have extracted some features regarding an image, and want to find if there are any similarities between two images using those features. Here are the list of features that I have extracted: Aspect…
Shruthi Kodi
  • 107
  • 1
  • 3
  • 10
0
votes
1 answer

Euclidean vs Cosine for text data

IF I use tf-idf feature representation (or just document length normalization), then is euclidean distance and (1 - cosine similarity) basically the same? All text books I have read and other forums, discussions say cosine similarity works better…
0
votes
1 answer

Vector Space Model Introduction

What are different types of VSM (vector space model)? One which I know (as per wiki) is tf-idf (cosine similarity is used in this method, but its not a separate method). Which are other ways? Also what are different dimensions of a word in a…
divyum
  • 1,286
  • 13
  • 20
0
votes
1 answer

Create concept vector from ontology

I have a set of documents pertaining to a domain. The data in those documents can be conceptually mapped to a domain ontology. I need to find similarity scores between those docs. In literature, many have proposed to create a vector of…
0
votes
1 answer

Cosine Similarity in Java

I want to calculate the similarity in rows of a matrix such as D, but the results are not correct!! What is the problem of my codes? In calculating the similarity of rows in matrix U, i did as below.. as results shows, the similarities of rows is…
Shokouh Dareshiri
  • 826
  • 1
  • 12
  • 24