you can see EML as a sort of RMSE for CDF probability functions
given N classes, all you need is a normalized probability score for each sample. in neural network domains, this is achieved with softmax activation function as output layer
The EML simply compares the CDF of predictions vs realities
In a classification problem with 10 classes, for a single sample, we can have these arrays
y_true = [0,0,0,1,0,0,0,0,0,0] # the sample belong to the 4th class
y_pred = [0.1,0,0,0.9,0,0,0,0,0,0] # probabilities output of softmax layer
on them we compute CDFs and get the following scores:
CDF_y_true = [0,0,0,1,1,1,1,1,1,1]
CDF_y_pred = [0.1,0.1,0.1,1,1,1,1,1,1,1]
as defined above, the EML compute the RMSE on this CDFs
y_true = np.asarray([0.,0.,0.,1.,0.,0.,0.,0.,0.,0.])
y_pred = np.asarray([0.1,0.,0.,0.9,0.,0.,0.,0.,0.,0.])
cdf_true = K.cumsum(y_true, axis=-1)
cdf_pred = K.cumsum(y_pred, axis=-1)
emd = K.sqrt(K.mean(K.square(cdf_true - cdf_pred), axis=-1))
In the specific case of NIMA Paper by Google on TID2013, N=10 and the labels are express in the form of float scores. In order to train the network with EML these are the steps to follow:
- digitalize the float scores in 10 intervals
- one-hot encode the labels to get softmax probabilities and minimize EML
at the end of the train, our NN is able to produce, on a given image, a probability score for each class.
we have to transform this score in a mean quality score with a related standard deviation as defined in the paper.
to do this we follow the procedure defined in the paper
bins = [1,2,3,4,5,6,7,8,9,10]
y_pred = [0.1,0,0,0.9,0,0,0,0,0,0] # probabilities output of softmax layer
mu_score = sum(bins*y_pred) = 1*0.1 + 2*0 + 3*0 + 4*0.9 + ... + 10*0
sigma_score = sum(((bins - mu_score)**2)*y_pred)**0.5
bins = np.arange(1,11)
y_pred = np.asarray([0.1,0.,0.,0.9,0.,0.,0.,0.,0.,0.])
mu_score = np.sum(bins*y_pred)
std_score = np.sum(((bins - mu_score)**2)*y_pred)**0.5