1

Good afternoon !

I asked this question many times without any reply or comment (even in cross validated)!

Under R, I had implemented the expectation-maximization algorithm for gaussian-mixture model :

k=3   # number of known clusters 
w_k=rep(1,k)/k  # we initialize clusters weights by 1/k for each one 
n_j=rep(0,k)    # will be used later
print(w_k)      # printing
data=as.matrix(iris[1:150,-5]) # numerical datasets with 4-dimensions/axis , fifth-axis contains clusters labels.

means=sample(1:dim(data)[1],k,replace=FALSE) # We choose "k" vectors to be intial clusters means. means contains rows indices
mu=as.matrix(iris[means,-5])  # We retreive those means shuffled at random in a matrix of k rows and 4 columns.
sigma=cov(data)      # We compute covariance matrix for the whole dataset. 
print(sigma)
print(solve(sigma))
sigma_list=rep(list(sigma),k) # For each of the k clusters, we itinialize it's covariance matrix by sigma. 
 
# w_k , mu and sigma_list are the 3 parameters to update during the M-step of Esperance maximization ( EM-algorithm).

# Function to compute P(Xi/Cj).
# example : P_Xi_Cj(x=data[1,], mu = mu[3,] , sigma= sigma) // P(X1/C3)  liklihood of obser1 given cluster3.
#i.e : liklihood of observation given a cluster
P_Xi_Cj <- function(Xi , Cj , sigma= sigma) {
    x=Xi
    mu=Cj
    d=length(x)
    if (length(as.vector(sigma))==1 ){
        if(length(as.vector(x))==1){
            if(length(as.vector(mu))==1){
             return(as.numeric(1/sqrt(abs(1/sigma)*(2*pi)^d)*exp(-1/2*(x)%*%(1/sigma)%*%(x))))
             break  
            }
        }  }
    
    tr1=matrix(x-mu,ncol=1)
    tr2=matrix(x-mu,nrow=1)
    inverse=solve(sigma)
    total=tr2 %*% inverse %*% tr1
    return(as.numeric(1/sqrt(abs(det(sigma))*(2*pi)^d)*exp(-1/2*total)))
    
  }



# Computing P(Cj/Xi) : what is the more likely cluster given the observation ? 

P_Cj_Xi<-function(Xi , mu=mu ,sigma_list=sigma_list,W_k=w_k,n_clusters=k){
k=n_clusters
n_j=rep(0,n_clusters)
P_C_Xi=rep(0,n_clusters)    
r=matrix(NA,length(Xi),length(Xi))    
r=rep(list(r),n_clusters )    
r=lapply(1:n_clusters , function(i) r[[i]]=solve(matrix(unlist(sigma_list[i]),ncol=length(Xi))))     
n_j=sapply(1:n_clusters, function(i) -1/2*(rbind(Xi-mu[i,]))%*%as.matrix(r[[i]])%*%(cbind(Xi-mu[i,])))          
total=sum(((sapply(1:n_clusters,function(i) abs(det(as.matrix(r[[i]])))))^(-1/2))*exp(n_j)*W_k)
P_C_Xi=((sapply(1:n_clusters,function(i) abs(det(as.matrix(r[[i]])))))^(-1/2))*exp(n_j)*W_k/total
names(P_C_Xi)=paste("P(Cj/Xi)",1:n_clusters)
               
return(P_C_Xi)        
}
# example :  P_Cj_Xi(Xi=data[1,],mu=mu ,sigma_list=sigma_list,W_k=w_k,n_clusters=k)  i.e P(Cj/X1) j=1,..,n_clusters
                
M_step<-function(data=data,mu ,sigma_list,W_k,n_clusters=k){
l=lapply(1:nrow(data) , function(i) P_Cj_Xi(Xi=data[i,],mu,sigma_list,W_k,n_clusters=k) )  # E-step 
#print(l)       
sum_P_Cj_Xi=as.vector(Reduce("+", l)) 
#print("sum_P_Cj_Xi")
#print(sum_P_Cj_Xi)          
W_j=sum_P_Cj_Xi/nrow(data) #updating clusters weights (7)
mu=t(sapply(1:k, function(j) Reduce("+",lapply(1:nrow(data), function(i) l[[i]][j]*data[i,]/sum_P_Cj_Xi[j])))) #updating clusters means 
sigma=lapply(1:k, function(j) Reduce("+",lapply(1:nrow(data), function(i) l[[i]][j]*(data[i,]-mu[j,])%*%t(data[i,]-mu[j,])/sum_P_Cj_Xi[j]))) #updating clusters cov matrices         
print(list(mu,sigma,W_j)) #printing for debuggging
return(list(mu,sigma,W_j)) 
                                                
}                

max=6                            
t <-max  # number of total iterations
while (t <= max) {
if(t==0) break 
if(t==max){
mu1=mu
sigma=sigma_list 
w_j=w_k  
tmp=M_step(data=data,mu=mu1 ,sigma_list=sigma,W_k=w_j,n_clusters=k)
t=t-1      
}else{
print(c("iteration : ",t))         
mu1=tmp[[1]]
sigma=tmp[[2]]
w_j=tmp[[3]]   
tmp=M_step(data=data,mu=mu1 ,sigma_list=sigma,W_k=w_j,n_clusters=k) 
   
if(t==0) break                                   
t = t-1    
}    

                              
}

This implementation is based on the equations explained at part 2 of this work : https://arxiv.org/ftp/arxiv/papers/1603/1603.07879.pdf

My question is simple, According to the EM algorithm :

Are the obtained results/outputs of updated parameters (mu, sigma,w_k) correct / ( matched the em approach )?

I hope my question is simple and clear!

Thank you for help in advance !

Tou Mou
  • 1,270
  • 5
  • 16
  • 4
    I don't think anyone is going spend their time verifying your work, there is too much to test, in other words your question is too broad. You should either ask specific questions about specific parts of your code, or test it yourself by running it on known outcomes, see it if returns the same outputs. – user2974951 Oct 21 '20 at 12:22
  • @user2974951, iris datasets have 3 known clusters. Each cluster has a mean vector and a covariance matrix. Should I compare the distance between those ( 3 means and 3 cov_matrices ) with the computed means+cov_matrices generated by the R code to say that the algorithm converges as expected ? – Tou Mou Oct 21 '20 at 13:36
  • 1
    EM is an algorithm to maximize the likelihood function. You can tell if it's working by looking at the log likelihood from one iteration to the next -- LL must be increasing from one iteration to the next. Bear in mind that EM is a local maximization -- if you start from different starting points, you might end up at different local maxima, but I would expect that you should see there are disjoint basins of attraction for each local maximum. Something else to try is to construct a problem for which you know the solution -- does your algorithm find it? Good luck and have fun. – Robert Dodier Oct 22 '20 at 16:15
  • 1
    To follow up on my previous comment. One kind of a test is to let the initial parameters be the solution of a known problem. You should find that the algorithm just stays there. – Robert Dodier Oct 23 '20 at 02:56
  • @Robert Dodier, yes this is the best way. Thank you a lot for your precious help! – Tou Mou Oct 23 '20 at 14:42
  • The question could be closed! – Tou Mou Oct 23 '20 at 14:43

0 Answers0