maximum entropy model and logistic regression

Question

I am doing a project that has some Natural Language Processing to do. I am using stanford MaxEnt Classifier for the purpose.But I am not sure, whether Maximum entropy model and logistic regression are one at the same or is it some special kind of logistic regression?

Can anyone come up with an explanation?

maximum entropy is the same as multinomial logistic regression — NLPer, Jan 20 '14 at 22:48

score 6 · Answer 1 · answered Jan 20 '14 at 21:05

This is exactly the same model. NLP society prefers the name Maximum Entropy and uses the sparse formulation which allows to compute everything without direct projection to the R^n space (as it is common for NLP to have huge amount of features and very sparse vectors).

score 5 · Answer 2 · edited Jun 20 '20 at 09:12

You may wanna read the attachment in this post, which gives a simple derivation: http://www.win-vector.com/blog/2011/09/the-equivalence-of-logistic-regression-and-maximum-entropy-models/

An explanation is quoted from "Speech and Language Processing" by Daniel Jurafsky & James H. Martin.:

Each feature is an indicator function, which picks out a subset of the training observations. For each feature we add a constraint on our total distribution, specifying that our distribution for this subset should match the empirical distribution we saw in our training data. We then choose the maximum entropy distribution which otherwise accords with these constraints.

Berger et al. (1996) show that the solution to this optimization problem turns out to be exactly the probability distribution of a multinomial logistic regression model whose weights W maximize the likelihood of the training data!

score 3 · Answer 3 · answered Jan 21 '14 at 08:19

3

In Max Entropy the feature is represnt with f(x,y), it mean you can design feature by using the label y and the observerable feature x, while, if f(x,y) = x it is the situation in logistic regression.
in NLP task like POS, it is common to design feature's combining labels. for example: current word ends with "ous" and next word is noun. it can be feature to predict whether the current word is adj

answered Jan 21 '14 at 08:19

michaeltang

2,850
15
18

1

For each "class dependent" features, there exists equivalent class independent feature set for logitic regression. It is only about sparseness. – lejlot Jan 21 '14 at 15:16
After running max Entropy classifier, Given weights for each feature per class, then how to find out which are the best features so as to remove other features. @lejlot – Amrith Krishna Jan 21 '14 at 15:31
@AmrithKrishna , the absolute value of weights respect to each feature is an indication to the importance of the feature – michaeltang Jan 22 '14 at 00:31
@michaeltang aggreed, but i am getting 5 weights (per class) for each feature. What am I supposed to find the relevant features – Amrith Krishna Jan 22 '14 at 03:46

maximum entropy model and logistic regression

3 Answers3