0

I am using PyMC3 to cluster my grouped data. Basically, I have g vectors and would like to cluster the g vectors into m clusters. However, I have two problems.

The first one is that, it seems PyMC3 could only deal with one-dimensional data but not vectors. The second problem is, I do not know how to extract the cluster id for the raw data. I do extract the number of components (k) and corresponding weights. But I could not extract the id that indicating the which cluster that each point belongs to.

Any ideas or comments are welcomed!

Hannah
  • 1
  • With respect to your first question, I have just asked essentially the same one here: https://discourse.pymc.io/t/sampling-semantics-of-multiple-observed-variables/3152 Perhaps one of the PyMC3 developers can answer this. – Robert P. Goldman Apr 24 '19 at 20:34

1 Answers1

0

If I understand you correctly, you're trying to extract which category (1 through k) a data point belongs to. However, a Dirichlet random variable only produces a probability vector. This should be used as a prior for a Categorical RV, and when that is sampled from, it will result in a numbered category.

Camden Cheek
  • 65
  • 2
  • 7