r: NbClust() very unreliable / buggy?

Question

I am running NbClust() on many different dataframes (i.e. different data, different dimensionality). In most cases it works fine but in some it produces weird errors which seem to be due to some computational bug within some of the indices that NbClust() computes.

This is what my code looks like

library(NbClust)

NbClust(df, distance="euclidean", min.nc=3, max.nc=5, method = "complete")

This is the error I get

Error in if ((resCritical[ncB - min_nc + 1, 3] >= alphaBeale) &&
(!foundBeale)) { :    missing value where TRUE/FALSE needed

another error which I am coming across very often is the following

Error in if ((res[ncP - min_nc + 1, 15] <= resCritical[ncP - min_nc + 
 :    missing value where TRUE/FALSE needed

Has anyone ever encountered similar problems? Or does anyone know why NbClust() is so unreliable? Any workarounds?

Data looks as follows

df = structure(list(Rate = c(-0.161, -0.519, 1.163, -0.781, -0.755, 
2.252, -0.206, -0.796, -0.803, 1.444, -0.652, -0.541, -0.759, 
-0.309, 0.945, -0.202, -0.449, 0.551, -0.774, 0.993, -0.434, 
-0.604, -0.571, -0.545, -0.722, -0.696, -0.678, -0.512, -0.759, 
2.857, 0.145, -0.206, -0.689, 0.514, 2.373, -0.659, 0.628, 0.2, 
2.746, -0.781, -0.704, 2.019, -0.826, -0.051, 0.034, -0.693, 
-0.047, -0.571, -0.335, -0.073), Losses = c(-0.142, 4.327, 5.004, 
-0.293, -0.293, -0.293, -0.191, -0.293, -0.293, -0.293, 1.044, 
0.151, -0.276, -0.293, -0.293, 0.004, -0.024, 0.151, -0.293, 
0.223, -0.293, -0.293, 0.089, -0.293, -0.27, -0.293, -0.293, 
-0.293, -0.293, -0.287, -0.03, 0.26, -0.293, -0.223, -0.293, 
-0.293, -0.293, 0.066, -0.293, -0.293, -0.293, -0.215, 0.086, 
0.086, -0.522, -0.518, -0.497, -0.522, -0.516, -0.518)), .Names = c("Rate", 
"Losses"), row.names = c(NA, -50L), class = "data.frame")

A former unanswered question regarding the same problem can be found here

Can you please provide a reproducible example? Without having sample data that reproduces this problem, it's going to be very difficult to diagnose and fix. — Andrie, May 18 '16 at 10:14
I had the same problem once. I chose to manually exclude the problematic indices in the argument `index` of the function. — agenis, May 18 '16 at 11:08
@agenis, that is great to hear that I am not the only one to have this problem. So do you still remember which indices you excluded to make it work? And do you know why `NbClust()` is behaving so strangely/ unreliable? — Jonathan Rhein, May 18 '16 at 12:39
@Andrie, ok, please find my edits above in the description of my question. I would be very thankful for any kind of advice — Jonathan Rhein, May 18 '16 at 13:41
@Andrie -since you seem to be one really experienced stack overflow users - do you think the question is worth of putting a bounty on it? I really would like to get behind what is causing the error... — Jonathan Rhein, May 18 '16 at 19:09
Make it minimally reproducible first. Add all the necessary library() calls. Run your code in a clean session and ensure the error happens. — Andrie, May 18 '16 at 22:53
@Andrie, thank you, I edited my code, now it show be reproducible, do you have any advice? — Jonathan Rhein, May 19 '16 at 07:32
I think you should file a bug report with the authors. Simplify your code even further. All you need to reproduce is `dat <- as.data.frame(df.combos[8]); NbClust(dat, distance="euclidean", min.nc=3, max.nc=5, method = "complete")`. The error occurs because with 4 clusters the NbClust code can't compute the `pf()` value in line 1546 of the function. — Andrie, May 19 '16 at 09:50
@Andrie, thank you! I have never filed a bug report before, is there any good tutorial/ template on how to do this? — Jonathan Rhein, May 19 '16 at 12:27

r: NbClust() very unreliable / buggy?

0 Answers0