I would like to measure the quality of clustering using Quantization Error but can't find any clear info regarding how to compute this metric.
The few documents/ articles I've found are:
- "Estimating the number of clusters in a numerical data set via quantization error modeling" (Unfortunately there's no free access to this paper)
- This question posted back in 2011 on Cross-Validated about the different types of distance measures (the question is very specific and doesn't give much about the calculation)
- This gist repo where a
quantization_error
function (at the very end of the code) is implemented in Python
Regarding the third link (which is the best piece of info I've found so far) I don't know how to interpret the calculation (see snippet below):
(the # annotations are mine. question marks indicate steps that are unclear to me)
def quantization_error(self):
"""
This method calculates the quantization error of the given clustering
:return: the quantization error
"""
total_distance = 0.0
s = Similarity(self.e) #Class containing different types of distance measures
#For each point, compute squared fractional distance between point and centroid ?
for i in range(len(self.solution.patterns)):
total_distance += math.pow(s.fractional_distance(self.solution.patterns[i], self.solution.centroids[self.solution.solution[i]]), 2.0)
return total_distance / len(self.solution.patterns) # Divide total_distance by the total number of points ?
QUESTION: Is this calculation of the quantization error correct ? If no, what are the steps to compute it ?
Any help would be much appreciated.