Suppose I have a training set of (x, y)
pairs, where x
is the input example and y
is the corresponding target and y
is a value (1 ... k)
(k
is the number of classes).
When calculating the likelihood of the training set, should it be calculated for the whole training set (all of the examples), that is:
L = P(y | x) = p(y1 | x1) * p(y2 | x2) * ...
Or is the likelihood computed for a specific training example (x, y)
?
I'm asking because I saw these lecture notes (page 2), where he seems to calculate L_i, that is the likelihood for every training example separately.