1

The mixture model code in scikit-learn works for a list individual data points, but what if you have a histogram? That is, I have a density value for every voxel, and I want the mixture model to approximate it. Is this possible? I suppose one solution would be to sample values from this histogram, but that shouldn't be necessary.

cgreen
  • 351
  • 1
  • 17
  • welcome to slashdot! i don't know much about scikit, but if you can post some more details, including maybe some sample code for discussion, you may get more responses. – Corley Brigman Oct 24 '13 at 00:26
  • It depends on the application. Do you actually need the locations and standard deviations of the gaussian mixture model? Are you using it for classification? If you really need a Gaussian mixture model you could use a multi-peak Gaussian fit to approximate the histogram with a Gaussian mixture model. But, that wouldn't be the easiest route if you simply want smooth interpolation, in that case something like bicubic interpolation may be simpler. – willtalmadge Oct 24 '13 at 00:36

2 Answers2

0

Scikit-learn has extensive utilities and algorithms for kernel density estimation, which is specifically centered around inferring distributions from things like histograms. See the documentation here for some examples. If you have no expectations for the distribution of your data, KDE might be a more general approach.

Kyle Kastner
  • 1,008
  • 8
  • 7
0

For 2D histogram Z (your 2D array of voxels)

import numpy as np
# create the co-ordinate values
X, Y = np.mgrid[0:Z.shape[0], 0:Z.shape[1]]

# artificially create a list of points from your histogram
data_points = []
for x, y, z in zip(X.ravel(), Y.ravel(), Z.ravel()):
    # add the data point / voxel (x, y) as many times as it occurs
    # in the histogram
    for iz in z:
        data_points.append((x, y))

# now fit your GMM
from sklearn.mixture import GMM
gmm = GMM()
gmm.fit(data_points)

Though, as @Kyle Kastner points out, there are better methods for achieving this. For a start, your histogram will be 'binned' which will already loose you some resolution. Can you get hold of the raw data before it was binned?

danodonovan
  • 19,636
  • 10
  • 70
  • 78