1

When trying to perform a univariate mixture model estimation, R package mclust produces the following output with an error:

----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm 
----------------------------------------------------

Mclust V (univariate, unequal variance) model with 9 components:

    log.likelihood     n               df               BIC              ICL
 -25576.3596860173 53940 26.0000000000001 -51436.0056895487 -84761.892259584

Clustering table:
    1     2     3     4     5     6     7     8     9 
 3081  8923  8177  3017  8742  6568 10378  3650  1404 
Error in object[as.character(G), modelNames, drop = FALSE] : 
  incorrect number of dimensions

Reproducible example:

library(mclust)

set.seed(12345) # for reproducibility

data(diamonds, package='ggplot2')  # use built-in data
myData <- log10(diamonds$price)

# determine number of components
mc <- Mclust(myData)
print(summary(mc))

Additional diagnostic information:

> str(mc)
List of 15
 $ call          : language Mclust(data = myData)
 $ data          : num [1:53940, 1] 2.51 2.51 2.51 2.52 2.53 ...
 $ modelName     : chr "V"
 $ n             : int 53940
 $ d             : num 1
 $ G             : int 9
 $ BIC           : num [1:9, 1:2] -64689 -57330 -54802 -52807 -52721 ...
 ..- attr(*, "dimnames")=List of 2
 .. ..$ : chr [1:9] "1" "2" "3" "4" ...
 .. ..$ : chr [1:2] "E" "V"
 ..- attr(*, "G")= num [1:9] 1 2 3 4 5 6 7 8 9
 ..- attr(*, "modelNames")= chr [1:2] "E" "V"
 ..- attr(*, "oneD")= logi TRUE
 $ bic           : num -51436
 $ loglik        : num -25576
 $ df            : num 26
 $ hypvol        : num NA
 $ parameters    :List of 4
 ..$ Vinv    : NULL
 ..$ pro     : num [1:9] 0.0606 0.1558 0.1628 0.0411 0.1655 ...
 ..$ mean    : Named num [1:9] 2.7 2.86 3.03 3.24 3.39 ...
 .. ..- attr(*, "names")= chr [1:9] "1" "2" "3" "4" ...
 ..$ variance:List of 5
 .. ..$ modelName: chr "V"
 .. ..$ d        : num 1
 .. ..$ G        : int 9
 .. ..$ sigmasq  : num [1:9] 0.004915 0.006932 0.009595 0.000833 0.008896 ...
 .. ..$ scale    : num [1:9] 0.004915 0.006932 0.009595 0.000833 0.008896 ...
 $ classification: num [1:53940] 1 1 1 1 1 1 1 1 1 1 ...
 $ uncertainty   : num [1:53940] 0.0132 0.0132 0.0134 0.015 0.0152 ...
 $ z             : num [1:53940, 1:9] 0.987 0.987 0.987 0.985 0.985 ...
 ..- attr(*, "dimnames")=List of 2
 .. ..$ : NULL
 .. ..$ : NULL
 - attr(*, "class")= chr "Mclust"

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C           
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rebmix_2.6.1       ddst_1.03          evd_2.3-0          orthopolynom_1.0-5
 [5] polynom_1.3-8      mixtools_1.0.2     segmented_0.4-0.0  boot_1.3-11       
 [9] mclust_4.3         MASS_7.3-34       

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.1      colorspace_1.2-4 digest_0.6.4     ggplot2_1.0.0   
 [5] grid_3.1.1       gtable_0.1.2     munsell_0.4.2    plyr_1.8.1      
 [9] proto_0.3-10     reshape2_1.4     scales_0.2.4     stringr_0.6.2   
[13] tools_3.1.1     
Aleksandr Blekh
  • 2,462
  • 4
  • 32
  • 64
  • I'm surprised that this question has been migrated from CrossValidated (CV), as the problem is unlikely related to R language or the `mclust` package per se, but likely related to data or statistical aspects of the data. Strangely, but I don't see any notes/comments from people on CV in this regard. – Aleksandr Blekh Sep 04 '14 at 21:11

1 Answers1

1

I have figured out myself what was the problem. The error message has been produced by another mclust function mclustModel(), which my code has been executing after the Mclust() call. The error was due to the fact that mclustModel() function requires a different type of object passed as a second argument. The correct call sequence, when using mclustModel() (instead of Mclust()), is shown below.

mc <- mclustBIC(myData)
print(summary(mc, myData, parameters = TRUE))
plot(mc)

bestModel <- mclustModel(myData, mc)
print(summary(bestModel, myData))
Aleksandr Blekh
  • 2,462
  • 4
  • 32
  • 64
  • Another issue, which might be worth posting as a separate question, is how to automatically determine the minimum number of mixture components, responsible for an arbitrary large share (say, 95%) of the distribution. I haven't seen a cut-off parameter like that within `mclust` package. – Aleksandr Blekh Sep 04 '14 at 23:41