0

Why is LP-Space normalization being used for Mahout VectorNormMapper for Item similarity. Have also read that the norm power of 2 works great for CosineSimilarity.

Is there an intuitive explanation of why its being used and how can best values for power be determined for given Similarity class.

Yash Sharma
  • 1,674
  • 2
  • 16
  • 23

1 Answers1

1

Vector norms can be defined for any L_p metric. Different norms have different properties according to which problem you are working on. Common values of p include 1 and 2 with 0 used occasionally.

Certain similarity functions in Mahout are closely related to a particular norm. Your example of the cosine similarity is a good one. The cosine similarity is computed by scaling both vector inputs to have L_2 length = 1 and then taking the dot product. This value is equal to the cosine of the angle between the vectors if the vectors are expressed in Cartesian space. This value is also sqrt(1-d^2) where d is the L_2 norm of the difference between the normalized vectors.

This means that there is an intimate connection between cosine similarity and L_2 distance.

Does that answer your question?

These questions are likely to get answered more quickly on the Apache Mahout mailing lists, btw.

Ted Dunning
  • 1,877
  • 15
  • 12
  • Great explanation Ted! Is there any way to estimate the value of power for any given Similarity measure or It comes with deep diving into the implementation of the similarity. – Yash Sharma Apr 13 '14 at 15:32
  • There are ways to estimate the power of a statistical test which apply if the assumptions are met. Unfortunately in the context that the LLR test is used, the power of a classical frequentist statistical test is really pretty much meaningless. – Ted Dunning Sep 08 '14 at 23:16