To my understanding, Logistic Regression is an extension of Naive Bayes. Suppose,
X = (X_1, X_2........X_N); Y = {0, 1}, each X_i is i.i.d and
the P(X_i|Y=y_k) is a Gaussian Distribution.
So in order to create Linear Decision Surface, we take the assumption of each pdf P(X_i|y_k)
having variance(sigma
) independent of the value of Y i.e. sigma_(i,k) = sigma_i
(i --> X_i, k --> y_k)
.
Finally we end up learning the coefficients (w_0, w_i
) that represent the Linear Decision Surface in following eqn.:
P(Y=0|X)/P(Y=1|X) = w_0 + sum_i(w_i*X_i) (Linear Decision Surface)
Even though the derivation of Linear Regression coefficients (w_0, w_i
) involves the assumption of Conditional Independent X_i given Y
,
- Why is it said that learning these coefficients from training data are somewhat more free from conditional indep. assumption as compared to learning the regular Bayesian Distribution coefficients (
mu, sigma
)?
I came across this while following this course here.
Any clarification/suggestion would be very helpful. Thanks