I'm doing some work trying to recommend documents, and to do so I am using the Cosine Similarity method. Here is the code for that method:
static double cosineSimilarity(HashMap<String, Double> v1, HashMap<String, Double> v2)
{
Set<String> both = v1.keySet();
both.retainAll(v2.keySet());
double sclar = 0, norm1 = 0, norm2 = 0;
for (String k : both)
{
sclar += v1.get(k) * v2.get(k);
}
for (String k : v1.keySet())
{
norm1 += v1.get(k) * v1.get(k);
}
for (String k : v2.keySet())
{
norm2 += v2.get(k) * v2.get(k);
}
return sclar / Math.sqrt(norm1 * norm2);
}
The problem is that the outcome varies depending on the order that the parameters are passed. For example if I call cosineSimilarity(v1, v2)
it will return 0.3
but if I call cosineSimilarity(v2, v1)
it will return a completely different value.
I figure this has something to do with the fact that Map.keySet()
returns a set backed by the map, but I do not fully understand the implications of this.
Can anyone see where the method is going wrong?