Please see below. If you want more details of the mathematics involved you might be better off posting on cross-validated.
Could someone outline why the log-sum-exp trick is/needs to be done?
This is for numerical stability. If you search for "logsumexp" you will see several useful explanations. E.g., https://hips.seas.harvard.edu/blog/2013/01/09/computing-log-sum-exp, and log-sum-exp trick why not recursive. Essentially, the procedure avoids numerical error that can occur with numbers that are too big / too small.
specifically what the argument Li,: reads as
The i
means take the ith row, and the :
means take all values from that row. So, overall, Li,: means the ith row of L. The colon :
is used in Matlab (and its open source derivative Octave) to mean "all indices" when subscripting vectors or matrices.
could someone give me a good notion of what the two values in line 8 of algorithm 3.1 is for?

This is the frequency that class C appears in the training examples.

Adding a hat indicates that this frequency is to be used as an estimate of the probability of class C appearing in the population as a whole. In terms of Naive Bayes, we can see these probabilities as priors.
And similarly...

An estimate of the probability of the jth feature appearing when you restrict your attention to class C. These are the conditional probabilities: P(j|c) = probability of seeing feature j given class c -- and the Naive in Naive Bayes means that we assume they are independent.
Note: the quotes from your question have been modified a little for clarity / convenience of exposition.
Edit in reply to your comment
- Li,: is a vector
N
is the no of training examples
D
is the dimension of the data, i.e. the number of features (each feature is a column in the matrix x
, whose rows are training examples).
- What is Li,:? Each Li,c looks like the log of: the prior for class c times the product of all P(i|c), i.e. the product of conditional probabilities of seeing the features for example i given class c. Note that there are only two entries in the vector Li,:, one for each class (it's binary classification, so there are just two classes).
Using Bayes Theorem, the entries of Li,: can be interpreted as the logs of relative conditional probabilities of the training example i being in class c given the features of i (actually they're not relative probabilities, because they each need to be divided by the same constant, but we can safely ignore that).
I'm not sure about line 6 of algorithm 3.2. If all you need to do is figure out which class your training example belongs to, then to me it seems sufficient to omit line 6 and for line 7 use argmax
c Lic. Perhaps the author included line 6 because pic has a particular interpretation?