Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions

votes

0 answers

How to use glove pretrained vectors to find similarity between two words and later in two documnets?

I am trying to find cosine similarity between two documents. To start with, I am trying to use it to find similarities between two words. I want to load the pretrained vectors and then use them to do the given task. There is not enough documentation…

python word2vec cosine-similarity

asked Dec 19 '16 at 15:49

Krishna Aswani

votes

0 answers

Pandas to SQL Server column limit for Cosine Similarity

Im calculating Cosine Similarity using NLTK and exporting the cosine similarity values to SQL Server which i would like to use for other reporting purpose. I have about 4773 columns with about 2k rows and SQL Server does not support these number of…

sql-server python-3.x pandas cosine-similarity

asked Dec 09 '16 at 20:28

RData

votes

1 answer

Kmean Algorithm and Cosine distance

I have used K-mean Algorithm with euclidean distance to cluster my dataset, then i tried cosine distance, but the algorithm does not converge with cosine metrics (it is not stopping - iteration reach to 1000 ) any suggestion please

cluster-analysis cosine-similarity

asked Dec 06 '16 at 07:01

nvayien iaziz

votes

1 answer

Ranking evaluation approach in two stage document retrieval

I have created a two-stage ranking system based on textual similarity ( cosine similarity ) between query-documents pair. Now I need to validate my ranking system whether the retrieved duly-ranked items are correct or not with respect to the user,…

ranking information-retrieval cosine-similarity ranking-functions

asked Nov 28 '16 at 11:41

pankaj kashyap

votes

1 answer

Cosine Similarity for user base collabrative system

I have 2 users (u1 andu2) and they have rated for 2 movies (m1 and m2) m1 m2 u1 1 1 u2 5 5 when I am calculating item based cosine similarity (1,5).(1,5)/|(1,5)||(1,5)|=1 (m1 and m2 are exactly similar) when i am calculating…

recommendation-engine cosine-similarity collaborative-filtering

asked Nov 24 '16 at 19:10

Asmit

votes

1 answer

Calculate pairwise similarity/distance between rows with conditional values in pandas

I'm trying to compute distance between between values in rows that share a category. For user_id 1 parameter 1, the distance between 1 and 7 Par 2 distance between 10, 20. df1 = pd.DataFrame({"user_id":[1,2,1,2], "Par1":[1, 3, 7,9], "Par2":[10,…

python pandas dataframe cosine-similarity

asked Nov 22 '16 at 15:25

lrn2code

votes

0 answers

SQL Compute Cosine Similarity with Specific vector

I have a Item and Vector table: CREATE TABLE Item ( itemID INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(256) ); CREATE TABLE Vector ( itemID INT REFERENCES Item(itemID), dim INT, value FLOAT ); So for instance, if item 3 has vector…

mysql sql cosine-similarity

asked Nov 20 '16 at 05:16

Jin Sakuma

votes

1 answer

Compute similarity between n entities

I am trying to compute the similarity between n entities that are being described by entity_id, type_of_order, total_value. An example of the data might look like: NR entity_id type_of_order total_value 1 1 A 10 2 1 …

machine-learning similarity data-science cosine-similarity

asked Nov 10 '16 at 02:29

Marc Zaharescu

votes

1 answer

Products Price Comparison Tool: Difficulty in matching identical items

I'm working on creating an e-comm products price comparison tool(in python) which is somewhat similar to camelcamelcamel.com, both for fun and profit. I'm facing the difficult when I want to match the identical items from the list that I gathered…

python machine-learning nlp information-retrieval cosine-similarity

asked Nov 07 '16 at 08:03

Emacs

votes

2 answers

Do Lucene(java framework) by default calculates the tf-idf and cosine similarity of a document against the term?

I am developing a search engine based application and was working on Lucene java framework, i am being confused by the score functionality by default provided by lucene i.e do the score functionality implements by default tf-idf and cosine…

java lucene search-engine tf-idf cosine-similarity

asked Oct 19 '16 at 15:49

Hamdan Sultan

votes

1 answer

Mahout : What is the value returned by AverageAbsoluteDifferenceEvaluator on TanimotoCoefficientSimilarity?

I'm trying to improve the mahout recommendation implementation in a project, and I found out that my predecessor used tanimotoCoefficientSimilarity for a dataset with preference value 1-5. I changed it to UncenteredCosineSimilarity, and now I'm…

mahout mahout-recommender cosine-similarity

asked Oct 11 '16 at 01:22

zoonoo

votes

2 answers

Cluster Scenario: Difference between the computedCost of 2 points used as similarity measure between points. Is it applicable?

I want to have a measure of similarity between two points in a cluster. Would the similarity calculated this way be an acceptable measure of similarity between the two datapoint? Say I have to vectors: vector A and vector B that are in the same…

apache-spark machine-learning cluster-analysis apache-spark-mllib cosine-similarity

asked Sep 28 '16 at 21:26

Mnemosyne

1,162
4
13
45

votes

0 answers

Calculating a measure of similarity between texts: Memory Error

Currently I'm working with texts. My main goal is to calculate a measure of similarity between 30 000 texts. I'm following this tutorial: Creating a document-term matrix: In [1]: import numpy as np # a conventional alias In [2]: from…

python-2.7 text distance cosine-similarity

asked Sep 22 '16 at 10:35

PineapplePizza

votes

1 answer

Weighted Cosine Similarity on Sparse Vectors

I am trying to compute the similarity between 2 sparse vectors using cosine similarity. which is working fine. However, I would like to take the additional step of introducing a weighting to each index of the vector. e.g. where the vectors to…

java vector cosine-similarity

asked Sep 02 '16 at 13:16

holtc

1,780
3
16
35

votes

1 answer

Formulate query and rank answers via cosine similarity Python

I tokenized multiple text files and created a tf-idf matrix from that: Token 1 Token 2 Token 3 Doc 1 0.00.. 0.0002 0.0003 Doc 2 0.00.. ... ... Doc 3 ... ... ... ... How do I now formulate a query, say for token 1 and token 3?…

python jupyter cosine-similarity

asked Aug 11 '16 at 07:59

adw

Prev 1 2 3

…

66 67 Next