0

I want to call the NbClust() function for a couple of dataframes. I do so by "sending" them all through a for loop that contains the NbClust() function call. The code looks like this:

#combos of just all columns from df
variations = unlist(lapply(seq_along(df), function(x) combn(df, x, simplify=FALSE)), recursive=FALSE)
for(i in 1:length(variations)){
  df = data.frame(variations[i]) 
  nc = NbClust(scale(df), distance="euclidean", min.nc=2, max.nc=10, method="complete")
}

Unfortunately it always generates the below error. Strangely enough, if I am applying the same function call without the loop (i.e. to only one data frame) it works perfectly... so what is wrong?

I have had a look at the source code of NbClust and indeed there is a line that contains the code of the error message but I am unable to change the code accordingly. Do you have any idea what the problem might be?

Error in if ((res[ncP - min_nc + 1, 15] <= resCritical[ncP - min_nc + : missing value where TRUE/FALSE needed

Additionally it produces the following warnings:

In addition: Warning messages:
1: In max(DiffLev[, 5], na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf
2: In matrix(c(results), nrow = 2, ncol = 26) :
  data length [51] is not a sub-multiple or multiple of the number of rows [2]
3: In matrix(c(results), nrow = 2, ncol = 26, dimnames = list(c("Number_clusters",  :
  data length [51] is not a sub-multiple or multiple of the number of rows [2]

Data looks as follows:

df = structure(list(GDP = c(18.2, 8.5, 54.1, 1.4, 2.1, 83.6, 17, 4.9, 
7.9, 2, 14.2, 48.2, 17.1, 10.4, 37.5, 1.6, 49.5, 10.8, 6.2, 7.1, 
7.8, 3, 3.7, 4.2, 8.7, 2), Population = c(1.22, 0.06, 0, 0.54, 
2.34, 0.74, 1.03, 1.405095932, 0.791124402, 2.746318326, 0.026149254, 
11.1252, 0.05183432, 2.992952671, 0.705447655, 0, 0.900246028, 
1.15476828, 0, 1.150673397, 1.441975309, 0, 0.713777778, 1.205504587, 
1.449230769, 0.820985507), Birth.rate = c(11.56, 146.75, 167.23, 
7, 7, 7, 10.07, 47.42900998, 20.42464115, 7.520608751, 7, 7, 
15.97633136, 15.1531143, 20.41686405, 7, 22.60379293, 7, 7, 18.55225902, 
7, 7.7, 7, 7, 7, 7), Income = c(54L, 94L, 37L, 95L, 98L, 31L, 
78L, 74L, 81L, 95L, 16L, 44L, 63L, 95L, 20L, 95L, 83L, 98L, 98L, 
84L, 62L, 98L, 98L, 97L, 98L, 57L), Savings = c(56.73, 56.49, 
42.81, 70.98, 88.24, 35.16, 46.18, 35.043, 46.521, 58.024, 22.738, 
60.244, 77.807, 80.972, 13.08, 40.985, 46.608, 63.32, 51.45, 
74.803, 73.211, 50.692, 65.532, 83.898, 60.857, 40.745)), .Names = c("GDP", "Population", "Birth.rate", "Income", "Savings"), class = "data.frame", row.names = c(NA, -26L))
Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Jonathan Rhein
  • 1,616
  • 3
  • 23
  • 47
  • @Stallion: Do you have any idea? – Jonathan Rhein Apr 26 '16 at 09:59
  • Any missing values in your data? – Has QUIT--Anony-Mousse Apr 28 '16 at 06:46
  • No, at least is.na() doesn't say so. – Jonathan Rhein Apr 28 '16 at 12:44
  • The only "work around" that I could figure out so far, was to copy the whole `NbClust()` function from the source file to my script and comment out the respective lines (which cause the error). This way one of the 26 indices of `NbClust()` (i.e. the "Frey"-index) is not computed. Of course, this is certainly not the optimal way, but I wasn't able to figure out how to change my code/ data accordingly in order not to see this error message any more... any ideas though? – Jonathan Rhein Apr 28 '16 at 12:51

1 Answers1

0

Some of the Clustering methods are not directly adapted to your datasets or type of data. You can select the best methods, or use all of them. When using all of them, it often happens that this produces an ERROR message (which is not a bug). By disabling the ERROR message that stops the loop, the below could be an alternative:

vc.method <- c("kl","ch", "hartigan","ccc", "scott","marriot","trcovw", "tracew","friedman", "rubin", "cindex", "db", "silhouette", "duda", "beale", "ratkowsky", "ball", "ptbiserial", "pseudot2", "gap", "frey", "mcclain",  "gamma", "gplus", "tau", "dunn", "hubert", "sdindex", "dindex", "sdbw", "alllong")
    
    val.nb <- c()
    
    for(method in 1:length(vc.method)){
      
      tryCatch({
        en.nb <- NbClust(na.omit(sum.sn), distance = "euclidean", min.nc = 2,
                       max.nc = vc.K.max, method = "kmeans", 
                       index = vc.method[method])
        
        val.nb <- c(val.nb, as.numeric(en.nb$Best.nc[1]))
        
        }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
      
    }