I'm trying to wrap my head around pymc3, which seems to be a beautiful package. One thing I don't understand is why all of the probabilities are in log? The description of the Beta Distribution says it is the "Beta log-likelihood". The functions for evaluating the distribution are "logp" and "logcdf". I've seen reference to logp in other places as well, but haven't seen any indication of why we are taking the log. I'm afraid I might be missing something fundamental. Thanks for any information.
-
Hi Benjamin. While your question is definitely interesting (and fundamentally a good one), it is probably not a very good fit for Stack Overflow, which deals with *specific coding* issues. That aside, take a look at this [post on Math.SE](https://math.stackexchange.com/questions/892832/why-we-consider-log-likelihood-instead-of-likelihood-in-gaussian-distribution) for an interesting discussion and a lot of details. The question is framed around normal distributions, but the answer(s) are more fundamental. – Maurits Evers Dec 05 '19 at 00:30
-
@MauritsEvers: That was a good read. If you want to add it as an answer I'd be happy to mark it the best, or if you feel it more appropriate I'd be happy to take the question down entirely. Does this mean that I can treat the log likelihood as if it were the likelihood for practical purposes? IE: The 25% cumulative pdf is the same either way, etc? – benjaminjsanders Dec 05 '19 at 15:32
-
*"treat the log likelihood as if it were the likelihood"* Well, the log-likelihood is exactly that: The log-transform of the likelihood. Following optimisation convention we usually *minimise* a function (the negative log-likelihood in MLE estimation) instead of *maximising* the likelihood itself. And that's not even mentioning the benefits of numeric stability. – Maurits Evers Dec 06 '19 at 00:51
-
PS. Another interesting post on [Cross Validated](https://stats.stackexchange.com/questions/289190/theoretical-motivation-for-using-log-likelihood-vs-likelihood). – Maurits Evers Dec 06 '19 at 00:51
1 Answers
Instead of repeating and not doing justice to what is said in the excellent posts on Mathematics and Cross Validated I thought I point out another nice connection between probabilities and the logarithm.
The principle of maximum entropy goes back to a 1957 publication by the physicist (and statistician) E. T. Jaynes; it can be used to construct the most general (i.e. least informative) probability distribution that, given a set of constraints, maximises the (information) entropy.
For example, let's say the only thing we know about a probability distribution is that it has a certain mean μ and variance σ². Following the principle of maximum entropy we can show that the least informative probability distribution corresponds to that of a general normal probability density with mean μ and variance σ².
So how does the logarithm come into play in all this? During the process of maximising the entropy, we (very early on) end up with an equation involving the logarithm of the probability distribution
where the λ's are constants (they are the Lagrange multipliers) that can be determined from the afore-mentioned set of constraints.

- 49,617
- 4
- 47
- 68