I need to run an EM algorithm for a mixture of three normal distributions with unknows means and variances. My data points are a column with 500 rows. I am gonna take it as 'S'.
First I need to write a function for negative log-likelihood of the mixture model. Here is what I wrote:
S <- df$Height
neg.logl = function(theta, p1, p2,x){
mu1 = theta[1]
mu2 = theta[2]
mu3 = theta[3]
sigma1 = exp(theta[4])
sigma2 = exp(theta[5])
sigma3 = exp(theta[6])
f.x = p1 * dnorm(x, mu1, sigma1) +
p2*dnorm(x, mu2, sigma2) +
(1-p1-p2)*dnorm(x, mu3, sigma3)
sum(-log(f.x))
}
Then I need to run the EM algorithm for 10 iterations for three different starting points. I am gonna take one starting point at one time and run the algorithm for three times seperately.
set.seed(30000)
max.iter = 10; n=nrow(df)
p1.comp1 = matrix(NA, max.iter, n)
p2.comp1 = matrix(NA, max.iter, n)
theta_EM = matrix(NA, max.iter, 6)
p1.comp1[1, ] = runif(n,0.1,0.3)
p2.comp1[1, ] = runif(n,0.2,0.3)
# Running first M step
theta_EM[1, ] = optim(c(-0.5, 0.5, -0.2, 0.2), function(theta)
neg.logl(theta, p1.comp1[1,],p2.comp1[1, ], S))$par
# Running alternate E and M steps
for(i in 2:max.iter){
# E step: update probability memberships
p1.temp = cbind(dnorm(S, theta_EM[i-1, 1], exp(theta_EM[i-1, 4])),
dnorm(S, theta_EM[i-1, 2], exp(theta_EM[i-1, 5])),
dnorm(S, theta_EM[i-1, 3], exp(theta_EM[i-1, 6])))
p2.temp = cbind(dnorm(S, theta_EM[i-1, 1], exp(theta_EM[i-1, 4])),
dnorm(S, theta_EM[i-1, 2], exp(theta_EM[i-1, 5])),
dnorm(S, theta_EM[i-1, 3], exp(theta_EM[i-1, 6])))
p1.comp1[i, ] = p1.temp[ , 1] / rowSums(p1.temp)
p2.comp1[i, ] = p2.temp[ , 1] / rowSums(p2.temp)
# M step: update parameter estimates
theta_EM[i, ] = optim(theta_EM[i-1, ],
function(theta) neg.logl(theta, p1.comp1[i,],p2.comp1[i, ], S))$par
}
#Final
theta.last = theta_EM[max.iter, ]
p1.temp = cbind(dnorm(S, theta.last[1], exp(theta.last[4])),
dnorm(x, theta.last[2], exp(theta.last[5])))
p2.temp = cbind(dnorm(S, theta.last[2], exp(theta.last[5])),
dnorm(x, theta.last[2], exp(theta.last[6])))
p3.temp = cbind(dnorm(S, theta.last[3], exp(theta.last[6])),
dnorm(x, theta.last[2], exp(theta.last[6])))
p1.last = p1.temp[ , 1]/rowSums(p1.temp)
p2.last = p2.temp[ , 1]/rowSums(p2.temp)
p2.last = p3.temp[ , 1]/rowSums(p3.temp)
logl.last = - neg.logl(theta.last, p1.last, p2.last,S)
cat("EP:",
c(theta.last[1], theta.last[2], theta.last[3],exp(theta.last[4]),
exp(theta.last[5],exp(theta.last[6])),
"\nlogl:", logl.last)
Then I get so many errors! Please help