0

I'm doing some work trying to recommend documents, and to do so I am using the Cosine Similarity method. Here is the code for that method:

static double cosineSimilarity(HashMap<String, Double> v1, HashMap<String, Double> v2) 
{
    Set<String> both = v1.keySet();
    both.retainAll(v2.keySet());
    double sclar = 0, norm1 = 0, norm2 = 0;
    for (String k : both) 
    {
      sclar += v1.get(k) * v2.get(k);
    }
    for (String k : v1.keySet())
    {
      norm1 += v1.get(k) * v1.get(k);
    }
    for (String k : v2.keySet()) 
    {
      norm2 += v2.get(k) * v2.get(k);
    }
    return sclar / Math.sqrt(norm1 * norm2);
}

The problem is that the outcome varies depending on the order that the parameters are passed. For example if I call cosineSimilarity(v1, v2) it will return 0.3 but if I call cosineSimilarity(v2, v1) it will return a completely different value.

I figure this has something to do with the fact that Map.keySet() returns a set backed by the map, but I do not fully understand the implications of this.

Can anyone see where the method is going wrong?

jk47
  • 755
  • 4
  • 10

1 Answers1

6

Try

Set<String> both = new HashSet<String>(v1.keySet());

instead of

Set<String> both = v1.keySet();

You shouldn't modify set you got from keySet method because map uses it and when you remove elements from it (by retainAll in your case) elements also removed from map. Example:

Map<Integer, Integer> mp = new HashMap<Integer, Integer>();
mp.put(1, 1);
System.out.println(mp); // output {1=1}
mp.keySet().remove(1);
System.out.println(mp); // output {}
Mikita Belahlazau
  • 15,326
  • 2
  • 38
  • 43
  • Cheers, didn't realise that `retainAll()` would effect the Map's contents but it makes a lot of sense now – jk47 Dec 28 '12 at 16:15
  • 1
    Here's a technique to reduce the risk of this sort of bug. Limit direct access to the HashMap to a class that builds it. That class can return an unmodifiable version for use in code that is only supposed to use it. – Patricia Shanahan Dec 28 '12 at 16:18