predicting from flexmix object (R)

Question

I fit some data to a mixture distribution of two gaussian in flexmix:

data("NPreg", package = "flexmix")
mod <- flexmix(yn ~ x, data = NPreg, k = 2,
           model = list(FLXMRglm(yn ~ x, family= "gaussian"),
                        FLXMRglm(yn ~ x, family = "gaussian")))

the model fit is as follows:

> mod

Call:
flexmix(formula = yn ~ x, data = NPreg, k = 2, model =    list(FLXMRglm(yn ~ x, family = "gaussian"), 
    FLXMRglm(yn ~ x, family = "gaussian")))

Cluster sizes:
  1   2 
 74 126 

convergence after 31 iterations

But how do I actually predict from this model?

when I do

pred <- predict(mod, NPreg)

I get a list with the predictions from each of the two components

To get a single prediction, do I have to add in the cluster sizes like this?

single <- (74/200)* pred$Comp.1[,1] + (126/200)*pred$Comp.2[,2]

score 8 · Answer 1 · edited Apr 13 '17 at 12:44

I use flexmix for prediction in the following way:

pred = predict(mod, NPreg)
clust = clusters(mod,NPreg)
result = cbind(NPreg,data.frame(pred),data.frame(clust))
plot(result$yn,col = c("red","blue")[result$clust],pch = 16,ylab = "yn")

And the confusion matrix:

table(result$class,result$clust)

For getting the predicted values of yn, I select the component value of the cluster to which a data point belongs.

for(i in 1:nrow(result)){
  result$pred_model1[i] = result[,paste0("Comp.",result$clust[i],".1")][i]
  result$pred_model2[i] = result[,paste0("Comp.",result$clust[i],".2")][i]
}

The actual vs predicted results show the fit (adding only one of them here as both of your models are same, you would use pred_model2 for the second model).

qplot(result$yn, result$pred_model1,xlab="Actual",ylab="Predicted") + geom_abline()

RMSE = sqrt(mean((result$yn-result$pred_model1)^2))

gives a root mean square error of 5.54.

This answer is based on many SO answers I read through while working with flexmix. It worked well for my problem.

You may also be interested in visualizing the two distributions. My model was the following, which shows some overlap as the ratio of components are not close to 1.

Call:
flexmix(formula = yn ~ x, data = NPreg, k = 2, 
model = list(FLXMRglm(yn ~ x, family = "gaussian"), 
             FLXMRglm(yn ~ x, family = "gaussian")))

       prior size post>0 ratio
Comp.1 0.481  102    129 0.791
Comp.2 0.519   98    171 0.573

'log Lik.' -1312.127 (df=13)
AIC: 2650.255   BIC: 2693.133

I also generate a density distribution with histograms to visulaize both components. This was inspired by a SO answer from the maintainer of betareg.

a = subset(result, clust == 1)
b = subset(result, clust == 2)
hist(a$yn, col = hcl(0, 50, 80), main = "",xlab = "", freq = FALSE, ylim = c(0,0.06))
hist(b$yn, col = hcl(240, 50, 80), add = TRUE,main = "", xlab = "", freq = FALSE, ylim = c(0,0.06))
ys = seq(0, 50, by = 0.1)
lines(ys, dnorm(ys, mean = mean(a$yn), sd = sd(a$yn)), col = hcl(0, 80, 50), lwd = 2)
lines(ys, dnorm(ys, mean = mean(b$yn), sd = sd(b$yn)), col = hcl(240, 80, 50), lwd = 2)

# Joint Histogram
p <- prior(mod)
hist(result$yn, freq = FALSE,main = "", xlab = "",ylim = c(0,0.06))
lines(ys, p[1] * dnorm(ys, mean = mean(a$yn), sd = sd(a$yn)) +
        p[2] * dnorm(ys, mean = mean(b$yn), sd = sd(b$yn)))

score 0 · Answer 2 · answered Jun 08 '17 at 08:05

0

You can pass an additional argument to your prediction call.

pred <- predict(mod, NPreg, aggregate = TRUE)[[1]][,1]

answered Jun 08 '17 at 08:05

vivek mishra

196
2
6

predicting from flexmix object (R)

2 Answers2