I am fairly new to R and am teaching myself some machine learning techniques. Currently I am working on hyperparameter tuning and to get a better understanding of the matter I try to do the tasks more manually than they need to be. So I am using a tibble with list columns where each row contains among other things a training set cross-validation fold and certain hyperparameter values of a random forest algorithm. The whole grid contains all unique combinations of these in a specified range.
The models should be built by iterating the ranger
function over all rows(i.e. fold/parameter combinations) and then be saved into a list column. For this purpose I use the map function family of the purrr package.
The problem is that this approach only works when mapping the data and one single parameter(mtry) to the ranger
function by using map2
. I know that I need to use pmap
when mapping more than 2 elements to a function. But this, unlike the two element case described before, does not work for me with data and two parameters(mtry and min.node.size) as elements. The pmap
function is somehow not able to map the third element(min.node.size) as argument to the ranger
function and I get the following error:
"Error in ranger(Species ~ ., data = .x, mtry = .y, min.node.size = .z) : object '.z' not found"
This is my code using the iris data set:
### used packages
library(tidyverse)
library(ranger)
library(rsample)
### data preparation
set.seed(123)
initial_split_data <- initial_split(iris, prop = 0.8)
training <- training(initial_split_data)
testing <- testing(initial_split_data)
cv_split <- vfold_cv(training, v = 3)
cv_data <- cv_split %>%
mutate(train = map(.x = splits, .f = ~training(.x)),
validate = map(.x = splits, .f = ~testing(.x)),
validate_species = map(.x = validate, .f = ~.x$Species))
### modeling
## two elements being mapped works:
random_forest_model_mtry <- cv_data %>%
crossing(mtry = seq(2,4,1)) %>%
mutate(model = map2(.x = train, .y = mtry,
.f = ~ranger(Species ~., data = .x, mtry = .y)))
## three elements being mapped does not work:
random_forest_model_mtry_minnode <- cv_data %>%
crossing(mtry = seq(2,4,1),
min.node.size = seq(1,5,1)) %>%
mutate(model = pmap(list(.x = train, .y = mtry, .z = min.node.size),
.f = ~ranger(Species ~., data = .x, mtry = .y, min.node.size = .z)))
It would be really helpful if someone could show me how to correctly use pmap
in this case so that the random forest models get executed.
Best regards