Use custom distance in step_umap function (tidymodels)

Question

I'm trying to create a recipe (preprocess for Xgboost model) which will use a custom metric (dice).

Here is my code :

Dice function and distance matrix

dice <- function(x,y){
  n1 <- sum(x==1 & y==0); n2 <- sum(x==0 & y==1)
  n3 <- sum(x==1 & y==1)
  return((n1+n2)/(n1+n2+2*n3))
}

X_dm <- (proxy::dist(X, dice))

Workflow

xgb_spec <- boost_tree(
  trees = 1000, 
  tree_depth = tune(), min_n = tune(), 
  loss_reduction = tune(),                     
  sample_size = tune(), mtry = tune(),         
  learn_rate = tune(),                         
) %>% 
  set_engine("xgboost") %>% 
  set_mode("classification")



trained <- res.MCA[["call"]][["Xtot"]][indextrain,] %>%
             data.frame() %>%
             mutate(class = Y_train)

#preprocess + formula
umap_rec <-
  recipe(class ~ ., data = trained) %>%
  step_downsample(under_ratio = tune()) %>%
  step_umap(
    all_predictors(),
    outcome = "class",
    num_comp = tune(),
    neighbors = tune(),
    min_dist = tune(),
    options = list(
                  target_weight = 0.5,
                   X = X_dm)
  )

#pipeline
wf <- workflow() %>%
  add_recipe(umap_rec) %>%
  add_model(xgb_spec)

# 
umap_param <-
  parameters(wf) %>%
  update(mtry = mtry(c(1,86))
         )

xgb_grid <- grid_latin_hypercube(
  umap_param,
  size = 10
)

vb_folds <- vfold_cv(trained,v=3)
cl <- makePSOCKcluster(7)
registerDoParallel(cl)
umap_tune_grid <- wf %>%
    tune_grid(
        resamples = vb_folds,
        grid = xgb_grid,
        param_info = umap_param,
        control = control_grid(verbose = FALSE),
        metrics = metric_set(f_meas)
    )
stopCluster(cl)

But I get this error :

Error in `estimate_tune_results()`:
! All of the models failed. See the .notes column.
Backtrace:
 1. umap_tune_grid %>% select_best()
 3. tune:::select_best.tune_results(.)
 5. tune:::show_best.tune_results(x, metric = metric, n = 1)
 6. tune::estimate_tune_results(x)

The error seems to come from the subargument 'X' equal to X_dm in step_umap function. I don't know how to take into consideration the custom dice metric in step_umap.

How can I do that if it is possible of course ?

Have a good day, Thanks in advance

score 0 · Answer 1 · answered Feb 01 '23 at 04:56

0

The X argument is the one for the data set (see the docs). Passing X in (which we should prohibit from working) means that the data that you pass to the recipe is not used.

In other words, the recipe gives uwot::umap() the current data in the X slot and then your option overwrites it with external data.

answered Feb 01 '23 at 04:56

topepo

13,534
3
39
52

Yes I know this is a wrong way to do this so what should I for using a custom metric in the step_umap function ? – Benco016 Feb 01 '23 at 08:06

score 0 · Answer 2 · answered Feb 01 '23 at 21:13

0

The underlying implementation that does the umap calculations doesn't allow users to pass in a custom metric function. Hence why step_umap() doesn't allow it.

Cross posted from here https://github.com/tidymodels/embed/issues/155

answered Feb 01 '23 at 21:13

EmilHvitfeldt

2,555
1
9
12

Use custom distance in step_umap function (tidymodels)

2 Answers2