1

I have generated a nested resampling object with the following code:


data<-read.csv("Data.csv", row.names=1)

data$factor<-as.factor(data$factor)

set.seed(123, "L'Ecuyer")

task = as_task_classif(data, target = "factor")

learner = lrn("classif.ranger", importance = "impurity", num.trees=10000)

measure = msr("classif.fbeta", beta=1)

terminator = trm("none")

resampling_inner = rsmp("repeated_cv", folds = 10, repeats = 10)

at = AutoFSelector$new(
  learner = learner,
  resampling = resampling_inner,
  measure = measure,
  terminator = terminator,
  fselect = fs("rfe", n_features = 1, feature_fraction = 0.5, recursive = FALSE),
  store_models = TRUE)

resampling_outer = rsmp("repeated_cv", folds = 10, repeats = 10)

rr = resample(task, at, resampling_outer)

I have a .csv file with the factor variable permuted/randomized and would like to apply the models of the nested resampling paradigm to this dataset so I can demonstrated differences in the model performance between the real dataset and the permuted/randomized dataset. I am interested in this to validate predictive performance because when sample sizes are small (which is common in biological contexts) prediction accuracy by chance alone can approach 70% or higher based on this paper (https://pubmed.ncbi.nlm.nih.gov/25596422/).

How would I do this using the resample object (rr)?

2 Answers2

2

I think I figured out how to do it (do let me know if I went wrong somewhere):


data<-read.csv("Data.csv", row.names=1)

data$factor<-as.factor(data$factor)

permuted<-read.csv("Data.csv", row.names=1)

permuted$factor<-as.factor(permuted$factor)

set.seed(123, "L'Ecuyer")

task1 = as_task_classif(data, target = "factor")

task2 = as_task_classif(permuted, target = "factor")

task_list = list(task1, task2)

learner = lrn("classif.ranger", importance = "impurity", num.trees=10000)

measure = msr("classif.fbeta", beta=1)

terminator = trm("none")

resampling_inner = rsmp("repeated_cv", folds = 10, repeats = 10)

at = AutoFSelector$new(
  learner = learner,
  resampling = resampling_inner,
  measure = measure,
  terminator = terminator,
  fselect = fs("rfe", n_features = 1, feature_fraction = 0.5, recursive = FALSE),
  store_models = TRUE)

resampling_outer = rsmp("repeated_cv", folds = 10, repeats = 10)

design = benchmark_grid(task=task_list, learner=at, resampling=resampling_outer)

bmr = benchmark(design, store_models = TRUE)

0

Am I right in assuming that you have two tasks t1 and t2, where the task t2 is permuted and you wanted to compare the performance of a learner on these two tasks?

The way to go then is to use the benchmark() function instead of the resample function. You would have to create two different tasks (one permuted and one not permuted). You might find the section Resampling and Benchmarking in our book helpful.

Sebastian
  • 865
  • 5
  • 13
  • Yeah this sounds like what I want to do. Can I feed a nested resample call into the benchmark function? I was reading and unsure how to build the benchmark grid using the AutoFSelector with inner resampling and outer resampling like my code above with the resample command. – abadgerw123 Feb 13 '23 at 10:57