I want to compute the posterior probabilities within the linear discriminant analysis framework, to choose a class with maximum posterior probability given X.
I got the results provided by the function MASS::lda(). Now i want to compute these results using my own code. However, i am not getting exatly the same results. Below is a reprex. Can anyone point to what i am missing? Thank you.
# x is a numeric vector (variable)
x <- matrix(c(5,6,5,7,2,1,3,2,4,2,4 ), ncol=1, byrow=T)
# y are the k (classes)
y <- c(1,1,1,1,1,1,1,2,2,2,2)
# merge into data.frame
df <- data.frame(x,y)
# plot values over a line
plot(x, col = y, pch= y, cex = 1.5)
# run lda with MASS
library(MASS)
mlda <- lda(y ~ ., data = df)
# predict class
m_prev <- predict(mlda)
# see posterior probabilities
m_prev$posterior
# now i run code to compute
# the posteriors:
# prior mean each k
prior1 <- sum(df$y==1)/nrow(df)
prior2 <- sum(df$y==2)/nrow(df)
# mean(x) each k
m1 <- round(mean(df[df$y==1,"x"]),2)
m2 <- round(mean(df[df$y==2,"x"]),2)
# standard deviation
s = sd(df$x)
# f(x) each k
fnorm1 <- dnorm(x, mean = m1, sd = s)
fnorm2 <- dnorm(x, mean = m2, sd = s)
# and finaly the posterior probabilites k=1 using the bayes theorem
# p(y=k|x=x) = fk(x)p(k) / sum(fi(x)p(i))
post1 <- prior1 * fnorm1 / (prior1 * fnorm1 + prior2 * fnorm2)
# it is not the same as the posterior probabilities
# obtained with MASS::lda()
cbind(post1, m_prev$posterior[,1])