I am wondering which technique is used to learn the Dirichlet priors in Mallet's LDA implementation.
Chapter 2 of Hanna Wallach's Ph.D. thesis gives a great overview and a valuable evaluation of existing and new techniques to learn the Dirichlet priors from the data.
Tom Minka initially provided his famous fixed-point iteration approach, however without any evaluation or recommendations.
Furthermore, Jonathan Chuang did some comparisons between previously proposed methods, including the Newton−Raphson method.
LiangJie Hong says the following in his blog:
A typical approach is to utilize Monte-Carlo EM approach where E-step is approximated by Gibbs sampling while M-step is to perform a gradient-based optimization approach to optimize Dirichlet parameters. Such approach is implemented in Mallet package.
Mallet mentions the Minka's fixed-point iterations with and without histograms.
However, the method that is actually used simply states:
Learn Dirichlet parameters using frequency histograms
Could someone provide any reference that describes the used technique?