Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions
0
votes
0 answers

How to use glove pretrained vectors to find similarity between two words and later in two documnets?

I am trying to find cosine similarity between two documents. To start with, I am trying to use it to find similarities between two words. I want to load the pretrained vectors and then use them to do the given task. There is not enough documentation…
Krishna Aswani
  • 181
  • 2
  • 13
0
votes
0 answers

Pandas to SQL Server column limit for Cosine Similarity

Im calculating Cosine Similarity using NLTK and exporting the cosine similarity values to SQL Server which i would like to use for other reporting purpose. I have about 4773 columns with about 2k rows and SQL Server does not support these number of…
RData
  • 959
  • 1
  • 13
  • 33
0
votes
1 answer

Kmean Algorithm and Cosine distance

I have used K-mean Algorithm with euclidean distance to cluster my dataset, then i tried cosine distance, but the algorithm does not converge with cosine metrics (it is not stopping - iteration reach to 1000 ) any suggestion please
0
votes
1 answer

Ranking evaluation approach in two stage document retrieval

I have created a two-stage ranking system based on textual similarity ( cosine similarity ) between query-documents pair. Now I need to validate my ranking system whether the retrieved duly-ranked items are correct or not with respect to the user,…
0
votes
1 answer

Cosine Similarity for user base collabrative system

I have 2 users (u1 andu2) and they have rated for 2 movies (m1 and m2) m1 m2 u1 1 1 u2 5 5 when I am calculating item based cosine similarity (1,5).(1,5)/|(1,5)||(1,5)|=1 (m1 and m2 are exactly similar) when i am calculating…
0
votes
1 answer

Calculate pairwise similarity/distance between rows with conditional values in pandas

I'm trying to compute distance between between values in rows that share a category. For user_id 1 parameter 1, the distance between 1 and 7 Par 2 distance between 10, 20. df1 = pd.DataFrame({"user_id":[1,2,1,2], "Par1":[1, 3, 7,9], "Par2":[10,…
lrn2code
  • 313
  • 1
  • 2
  • 15
0
votes
0 answers

SQL Compute Cosine Similarity with Specific vector

I have a Item and Vector table: CREATE TABLE Item ( itemID INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(256) ); CREATE TABLE Vector ( itemID INT REFERENCES Item(itemID), dim INT, value FLOAT ); So for instance, if item 3 has vector…
Jin Sakuma
  • 21
  • 2
0
votes
1 answer

Compute similarity between n entities

I am trying to compute the similarity between n entities that are being described by entity_id, type_of_order, total_value. An example of the data might look like: NR entity_id type_of_order total_value 1 1 A 10 2 1 …
0
votes
1 answer

Products Price Comparison Tool: Difficulty in matching identical items

I'm working on creating an e-comm products price comparison tool(in python) which is somewhat similar to camelcamelcamel.com, both for fun and profit. I'm facing the difficult when I want to match the identical items from the list that I gathered…
0
votes
2 answers

Do Lucene(java framework) by default calculates the tf-idf and cosine similarity of a document against the term?

I am developing a search engine based application and was working on Lucene java framework, i am being confused by the score functionality by default provided by lucene i.e do the score functionality implements by default tf-idf and cosine…
Hamdan Sultan
  • 226
  • 4
  • 16
0
votes
1 answer

Mahout : What is the value returned by AverageAbsoluteDifferenceEvaluator on TanimotoCoefficientSimilarity?

I'm trying to improve the mahout recommendation implementation in a project, and I found out that my predecessor used tanimotoCoefficientSimilarity for a dataset with preference value 1-5. I changed it to UncenteredCosineSimilarity, and now I'm…
zoonoo
  • 485
  • 1
  • 6
  • 13
0
votes
2 answers

Cluster Scenario: Difference between the computedCost of 2 points used as similarity measure between points. Is it applicable?

I want to have a measure of similarity between two points in a cluster. Would the similarity calculated this way be an acceptable measure of similarity between the two datapoint? Say I have to vectors: vector A and vector B that are in the same…
0
votes
0 answers

Calculating a measure of similarity between texts: Memory Error

Currently I'm working with texts. My main goal is to calculate a measure of similarity between 30 000 texts. I'm following this tutorial: Creating a document-term matrix: In [1]: import numpy as np # a conventional alias In [2]: from…
0
votes
1 answer

Weighted Cosine Similarity on Sparse Vectors

I am trying to compute the similarity between 2 sparse vectors using cosine similarity. which is working fine. However, I would like to take the additional step of introducing a weighting to each index of the vector. e.g. where the vectors to…
holtc
  • 1,780
  • 3
  • 16
  • 35
0
votes
1 answer

Formulate query and rank answers via cosine similarity Python

I tokenized multiple text files and created a tf-idf matrix from that: Token 1 Token 2 Token 3 Doc 1 0.00.. 0.0002 0.0003 Doc 2 0.00.. ... ... Doc 3 ... ... ... ... How do I now formulate a query, say for token 1 and token 3?…
adw
  • 11
  • 1