0

I could not find good explanation for what's going on exactly by using glm with pymc3 in case of logistic regression. So I compared the GLM version to an explicit pymc3 model. I started to write an ipython notebook for documentation, see:

http://christianherta.de/lehre/dataScience/machineLearning/mcmc/logisticRegressionPymc3.slides.php

What I don't understand is:

  • What prior is used for the Parameters in GLM? I assume they are also Normal distributed. I got different results with my explicit model in comparison to the build in GLM. (see link above)

  • With less data the sampling get's stuck and/or I got really poor results. With more training data I could not observe this behaviour. Is this normal for mcmc?

There are more issue in the notebook.

Thanks for your answer.

chris elgoog
  • 145
  • 8

1 Answers1

0

What prior is used for the Parameters in GLM

GLM is name for family of methods. Two popular priors: gaussian (corresponds to l2 regularization) and laplacian (corresponds to l1), usually the first one.

With less data the sampling get's stuck and/or I got really poor results. With more training data I could not observe this behaviour. Is this normal for mcmc?

Did you play with prior parameter? If model behaves badly with small amount of data, this may be due to strong prior (= too high regularization), which becomes the main term in optimization.

Alleo
  • 7,891
  • 2
  • 40
  • 30
  • Yes I played with the prior on the parameters for my explicit model. I used Gaussians and changed the tau (precision) value (inverse of variance). I know that GLM are a family of methods. Here I used the "family=pymc3.glm.families.Binomial()". To my knowledge that's corresponds to the standard logistic regression. With the GLM's of pymc3 (patsy - R syntax) I don't know how to change the prior. I don't found explicit documentation so I started to define my own explicit model. – chris elgoog Dec 05 '15 at 21:51