2

I'm trying to understand flexmix, and specifically what I am doing wrong trying to fit the simplest conceivable binomial mixture model (mixture of two intercept-only models).

set.seed(42)
data=data.frame(class=rbinom(1000,size=1,prob=0.3)) %>% # 30% in class 1, 70% in class 0
  mutate(yes=map_dbl(class,~ifelse(.x,rbinom(1,1,prob=0.8),rbinom(1,1,prob=0.4)))) # class 1 = 80%, class 2 = 40%

# Algebraic
(mean(data$yes==1)-0.4)/(0.8-0.4)
# = 0.295 this is what the model should recover, right?


library(flexmix)
mo1=FLXMRglm(offset=rep(log(.4/.6), nrow(data)),family="binomial")
mo2=FLXMRglm(offset=rep(log(.8/.2), nrow(data)),family="binomial")
flexfit <- flexmix(cbind(yes,1-yes) ~ -1, data=data, k=2, model=list(mo1, mo2))
flexfit
summary(flexfit)

This code is returning all data points as belonging to the first class. Am I setting up the model incorrectly? Or is there a more fundamental misunderstanding that I have about the way mixture models work?

Nicholas Root
  • 535
  • 3
  • 15

1 Answers1

0

Since you don't have regressors I think you need to specify an intercept in your linear model, that is cbind(yes, 1 - yes) ~ 1.

flexfit <- initFlexmix(cbind(yes, 1 - yes) ~ 1, data=data, k=2, model=list(mo1, mo2))
summary(flexfit)

splits the dataset into clusters of 482 and 518 units. The model has log-likelihood = -692.499, AIC = 1390.998, BIC = 1419.537.

Note that mixtures of binomial distributions are in general not identifiable. It is not a coincidence that group 2 has 518 units: they are exactly the number of observations for which yes == 1. If we pass the labels to the model we obviously recover the exact answer (making the model completely useless):

data$label <- ifelse(data$class == 1, 2, 1)
flexfit <- initFlexmix(cbind(yes, 1 - yes) ~ 1 | label, data=data, k=2, model=list(mo1, mo2))
summary(flexfit)

I hope somebody can prove me wrong, but I don't think there is a way to recover the hidden groups by only observing 0/1 outcomes and without strong prior assumptions.

Think of it this way: you can classify your binary observations into two clusters in many ways (total number is the Stirling number of the second kind), and each allocation is a completely reasonable explanation for your data. The maximum likelihood estimator is simply the most reasonable one: estimate the probability of the hidden class by the observed relative frequency, then put the conditional probabilities to 0 or 1 depending on the class.

Pr(Y = 1 | π, θ, ϕ) = Pr(C = 1) Pr(Y = 1 | C = 1) + Pr(C = 2) Pr(Y = 1 | C = 2) = π θ + (1 - π) ϕ

Set π to the observed relative frequency, ϕ = 1, and θ = 0. The flexmix model is more complicated than this, but I am sure that it can be reduced to my example considering we only have an intercept.

  • OK, I think I have a fundamental misunderstanding about what flexmix is doing. I understand that it is not possible to figure out whether a *particular* observation is from group 1 or 2, but I'm only interested in the mixture proportions. Also, I *do* have strong priors - the intercepts. E.g., suppose 100 times someone flips either a coin (p=0.5) or a bent coin (p=0.9), and tells me the # of heads. I want the most likely # of times they chose the bent coin. Obviously it's trivial to solve that on pen and paper, but I want to apply this to a more complex problem with IVs, random effects, etc. – Nicholas Root Oct 04 '19 at 13:30
  • By fixing the probability of success for each component you are practically labeling the groups. You don't have a classical mixed model anymore and I don't think flexmix allows the definition of parameters specifically for each component. I tried to write a custom M-step driver as explained in the flexmix vignette, but I had to stop when I couldn't give the components their known probability. I am afraid you have to implement the algorithm yourself. – Alberto Pessia Oct 10 '19 at 15:07