0

This is a stupid question but is there a way to feed in categorical observations in sklearn GMM module ?

My data looks somewhat like:

User,Siet_category,user_segment

UserA,Sports:News,efk-457
UserB,Music:Entertainment,asl-567
UserC,Sports:News,asl-567
UserD,Sports:News,efk-457

user_segment is the class in my data set (there are about 10 classes). I see this to be a mixture of 10 different distributions.

What I want to do is give a test user and the site category I want to know which class / distribution that test case would belong to.

I know I can opt for a discriminative model but I want to see how a generative model does in this case.

cryp
  • 2,285
  • 3
  • 26
  • 33
  • Using dummy variables for each category is a valid approach. Could you please provide more lines of data (>100) or upload a sample csv file. I can show to how to convert categorical data to dummies using pandas. – Jianxun Li Jun 23 '15 at 16:04
  • Gaussian distribution is not a good fit for your data, from a statistical point of you. It may still work though. You may be better off with multinomials or the like. – user1669710 May 03 '16 at 17:25

1 Answers1

0

The StepMix package follows the sklearn interface. Here's how you can fit categorical/multinoulli mixtures:

# Categorical StepMix Model with 3 latent classes
model = StepMix(n_components=3, measurement="categorical", verbose=0, random_state=123)

# Fit model and predict clusters
model.fit(data_categorical)