0

I am wondering whether there are any libraries (or frankly techniques) for quickly obtaining data about a Gaussian mixture model (and other mixture models). That is to say, given a list of weights, means and standard deviations I'd like to be able to quickly summarize things about the model like confidence intervals and obtain values for metrics like CRPS. Are there any suggestions for I might go about this with a computationally-efficient approach?

Sam H
  • 91
  • 1
  • 9
  • [scikit-learn](https://scikit-learn.org/stable/modules/mixture.html) is one of the most famous libraries you could use. – Ali_Sh Sep 10 '21 at 19:00
  • Also see [this link](https://stats.stackexchange.com/questions/23713/python-packages-for-working-with-gaussian-mixture-models-gmms) on stackexchange. – Ali_Sh Sep 10 '21 at 19:42
  • That's much appreciated. I didn't find that link when I searched. I don't think scikit-learn has any of this functionality though. Scikit-learn's mixtures are mostly designed to do the EM algorithm, I was wondering about getting information about the properties of the GMM. – Sam H Sep 11 '21 at 22:44
  • Sam, this is a great question, but as it is more conceptual, it's off topic here and more appropriate for stats.stackexchange.com. That said, I don't know of any libraries to compute various quantities for mixture distributions, but, depending on what you need, you can probably get started by just working from the definition. E.g. mixture c.d.f. = weighted sum of per-component c.d.f.'s. From that you can work towards confidence intervals. – Robert Dodier Sep 13 '21 at 16:20
  • Note that mixture distributions will have interesting properties such as disjoint confidence intervals; try to embrace the interesting stuff, don't make restrictive assumptions. When you have gotten as far as you can working from definitions, then try stats.se to ask about the specific problem you're having. For what it's worth, I worked out some specific results for mixture distributions. See appendix C in my dissertation: http://riso.sourceforge.net/docs/dodier-dissertation.pdf and search for "mixture". It's geared towards the stuff I was working on but maybe it can be inspiring in some way. – Robert Dodier Sep 13 '21 at 16:28
  • PS. What is the abbreviation CRPS? I'm not familiar with it. – Robert Dodier Sep 13 '21 at 16:38
  • Very sorry about the off topicness. I never know if cross validated, stats or here is correct when posting these questions that might be solved with clever math or might be solved with a nice implementation. CRPS is continous ranked probability score. I'm not sure how common it is but this is a nice description https://www.lokad.com/continuous-ranked-probability-score. Its just a convenient generalization for MAE. Thank you for the advice and link too, it is much appreciated – Sam H Sep 13 '21 at 19:31
  • What does quick/fast mean in this context? How many thousand times per second does it need to be done? – Jon Nordby Aug 19 '22 at 14:04

0 Answers0