0

I want to compute the posterior probabilities within the linear discriminant analysis framework, to choose a class with maximum posterior probability given X.

I got the results provided by the function MASS::lda(). Now i want to compute these results using my own code. However, i am not getting exatly the same results. Below is a reprex. Can anyone point to what i am missing? Thank you.

# x is a numeric vector (variable)  
x <- matrix(c(5,6,5,7,2,1,3,2,4,2,4 ), ncol=1, byrow=T)

# y are the k (classes)
y <- c(1,1,1,1,1,1,1,2,2,2,2)

# merge into data.frame
df <- data.frame(x,y)

# plot values over a line
plot(x, col =  y, pch= y, cex = 1.5)

# run lda with MASS
library(MASS)
mlda <- lda(y ~ ., data = df)

# predict class
m_prev <- predict(mlda)

# see posterior probabilities
m_prev$posterior

# now i run code to compute 
# the posteriors:

# prior mean each k
prior1 <- sum(df$y==1)/nrow(df)
prior2 <- sum(df$y==2)/nrow(df)

# mean(x) each k
m1 <- round(mean(df[df$y==1,"x"]),2)
m2 <- round(mean(df[df$y==2,"x"]),2)

# standard deviation
s = sd(df$x)

# f(x) each k 
fnorm1 <- dnorm(x, mean = m1, sd = s)
fnorm2 <- dnorm(x, mean = m2, sd = s)

# and finaly the posterior probabilites k=1 using the bayes theorem
# p(y=k|x=x) = fk(x)p(k) / sum(fi(x)p(i))
post1 <- prior1 * fnorm1 / (prior1 * fnorm1 + prior2 * fnorm2)

# it is not the same as the posterior probabilities 
# obtained with MASS::lda()
cbind(post1, m_prev$posterior[,1])
maluicr
  • 37
  • 6

0 Answers0