Cosine Similarity: is often used when comparing two documents against each other. It measures the angle between the two vectors. If the value is zero the angle between the two vectors is 90 degrees and they share no terms. If the value is 1 the two vectors are the same except for magnitude. Cosine is used when data is sparse, asymmetric and there is a similarity of lacking characteristics.
When I used cosine for two vectors (documents) I will get the results between according to following table
id Doc1(TF) Doc2 (TF)
London 5 3
Is 2 2
Nice 10 3
City 0 1
Then get normalization for that to the end. Then, I will get the cosine Cos(v1,v2)= 90%
BUT, If I have 10 documents that mean I have get
Cos(v1,v2)= ?
Cos(v1,v3)= ?
Cos(v1,v5)= ?
Cos(v1,v6)= ?
Cos(v1,v7)= ?
Cos(v1,v8)= ?
Cos(v1,v9)= ?
Cos(v2,v3)= ?
Cos(v2,v4)= ?
Cos(v2,v5)= ?
And so o n
Until
Cos(v9,v10)= ?
Then I have to compare the results.
Is the any fast method? How can i get the cos to 10 or more documents.
I know how can i get cosine for two Documents But how can i get about more document? I want the mathematical method.