I'm currently trying to impute the missing data through Gaussian mixture model. My reference paper is from here: http://mlg.eng.cam.ac.uk/zoubin/papers/nips93.pdf
I currently focus on bivariate dataset with 2 Gaussian components. This is the code to define the weight for each Gaussian component:
myData = faithful[,1:2]; # the data matrix
for (i in (1:N)) {
prob1 = pi1*dmvnorm(na.exclude(myData[,1:2]),m1,Sigma1); # probabilities of sample points under model 1
prob2 = pi2*dmvnorm(na.exclude(myData[,1:2]),m2,Sigma2); # same for model 2
Z<-rbinom(no,1,prob1/(prob1 + prob2 )) # Z is latent variable as to assign each data point to the particular component
pi1<-rbeta(1,sum(Z)+1/2,no-sum(Z)+1/2)
if (pi1>1/2) {
pi1<-1-pi1
Z<-1-Z
}
}
This is my code to define the missing values:
> whichMissXY<-myData[ which(is.na(myData$waiting)),1:2]
> whichMissXY
eruptions waiting
11 1.833 NA
12 3.917 NA
13 4.200 NA
14 1.750 NA
15 4.700 NA
16 2.167 NA
17 1.750 NA
18 4.800 NA
19 1.600 NA
20 4.250 NA
My constraint is, how to impute the missing data in "waiting" variable based on particular component. This code is my first attempt to impute the missing data using conditional mean imputation. I know, it is definitely in the wrong way. The outcome would not lie to the particular component and produce outlier.
miss.B2 <- which(is.na(myData$waiting))
for (i in miss.B2) {
myData[i, "waiting"] <- m1[2] + ((rho * sqrt(Sigma1[2,2]/Sigma1[1,1])) * (myData[i, "eruptions"] - m1[1] ) + rnorm(1,0,Sigma1[2,2]))
#print(miss.B[i,])
}
I would appreciate if someone could give any advice on how to improve the imputation technique that could work with latent/hidden variable through Gaussian mixture model. Thank you in advance