0

I am using the R package mclust to separate data into clusters. For this, I am using a uni-dimensional that allows for variable variances of the normal distributions underlying the clustering (the "V" model in the package).

The function looks like this: Mclust(dataToCluster, G=possibleClusters, modelNames=c("V")). To define the number of clusters possible, I use an array possibleClusters, e. g. 1:4 to allow for one to four clusters.

As a result of the clustering, after automatic model selection by Mclust using the BIC, I get a result with parameters of a normal distribution. For a model with three clusters, it might look like this:

# output shortened and commented for better readibility
>   result$parameters   

# proportion of data points per cluster ("lambda")
$pro   
[1] 0.3459566 0.3877521 0.2662913

# mean of normal distribution per cluster ("mu")    
$mean  
       1        2        3 
110.3197 204.0477 265.0929 

# variances per cluster ("sigma sq")   
$variance$sigmasq   
[1] 342.5032 128.4648 254.9257

However, I do have some knowledge about what these parameters are supposed to look like a priori. For example, I might know that:

  1. sigmasq must be between 100 and 1000 units
  2. the mean value for adjacent clusters must be at least 40 units apart
  3. if there are three clusters, the mean value of the third cluster must be at least 215 units

Here is a graphical example for possible results of the clustering (the x axis corresponds to the units of mean and sigma unsquared): Results of clustering

Taking into account the constraints given above, example plots A1 (according to rules 1 and 2) and B1 (according to rules 2 and 3) can't be correct. Instead, the results should look more like A2 and B2, which were produced using slightly different data. Note that, taking into account these constraints, the “best” number of clusters might change (A1 vs. A2).

I would like to know how to include this kind of a priori information when using the Mclust function. The function does have a parameter prior, which might allow for this but I wasn't able to figure out how this could work. How could I bring the constraints into the function?

0 Answers0