0

I would like to optimize the values of hyperparameters of ctree() (randomforest). I use the function tuneRF. I get the following error:

Error in if (n == 0) stop("data (x) has 0 rows") : 
  argument is of length zero

Here is my code:

library(party)
library(randomForest)
library(mlbench)
dat1 <- fread('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data',stringsAsFactors=T)

## split data to train and test
set.seed(123)
dat1 <- subset(dat1, !is.na(V1))
smp_size =92
train_ind <- sample(seq_len(nrow(dat1)), size = smp_size)
train <- dat1[train_ind, ]
test <- dat1[-train_ind, ]

ct <- ctree(V1 ~ ., data = train)

And here is my trial for finding the mtry

bestmtry <- tuneRF(train$V1,  train[V2:V9], stepFactor=1.5, improve=1e-5, ntree=500)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Avi
  • 2,247
  • 4
  • 30
  • 52

1 Answers1

1

There were a couple of issues,

bestmtry <- tuneRF(train[, V2:V9], train$V1, stepFactor = 1.5, improve = 1e-5, ntree = 500)

works.

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • Thanks @JuliusVainora! Does it mean that mtry = 6 is the optimal value? – Avi Nov 25 '18 at 19:42
  • To answer that, what's `smp_size`? It's not defined in your example. – Julius Vainora Nov 25 '18 at 19:43
  • Please elaborate what you mean in smp_size. I would like to increase the accuracy of ctree model using all the hyperparameters I can... – Avi Nov 25 '18 at 19:45
  • I mean that your example is not reproducible because `smp_size`, which is present in your code, isn't defined. For this reason, I cannot run `tuneRF` and answer your question about `mtry`. – Julius Vainora Nov 25 '18 at 19:46
  • smp_size =92 but it can be changed. – Avi Nov 25 '18 at 19:55
  • Practically, you can omit smp_size – Avi Nov 25 '18 at 19:56
  • When I run your code with `smp_size <- 92`, then only 2, 3, and 4 as values of `mtry` are considered, of which 3 is the best. – Julius Vainora Nov 25 '18 at 19:57
  • This is the results I get: mtry = 2 OOB error = 58.7% Searching left ... Searching right ... mtry = 3 OOB error = 57.61% 0.01851852 1e-05 mtry = 4 OOB error = 55.43% 0.03773585 1e-05 mtry = 6 OOB error = 55.43% 0 1e-05 Does it mean mtry =6 is the optimal ? – Avi Nov 25 '18 at 20:05
  • The lowest OOB error means the optimal choice. It looks like 4 and 6 both give the same 55.43%. In any case, that's not what your original question is about. – Julius Vainora Nov 25 '18 at 20:08
  • Thanks a lot again. Are there additional parameters I can optimize/tune by using some other functions? – Avi Nov 25 '18 at 20:11