0

Is it possible to retrieve the distance matrix from the kknn model when using mlr package in R and cross validation?

library("mlr")

data(iris)

task = makeClassifTask(data = iris, target = "Species")

lnr = makeLearner(
  cl = "classif.kknn",
  predict.type = "prob",
  k = 5,
  kernel = "gaussian",
  scale = TRUE
)

cv = crossval(
  learner = lnr,
  task = task,
  iters = 4,
  stratify = TRUE,
  measures = acc,
  show.info = FALSE,
  model = TRUE
)

str(cv$models[1])

I can't see anything related in cv$models or cv$pred.

JimBoy
  • 597
  • 8
  • 18
  • Also, your code uses `task = task` but you do not show us how you generated `task`. Is the data hidden in `task`? – G5W Aug 06 '17 at 13:36
  • Thanky you for the feedback. I just wanted to sketch out the `crossval` function, because `mlr` is highly standardized. As a result, the data import step is always the same, hence, this is why it is omitted here. – JimBoy Aug 06 '17 at 13:44

1 Answers1

1

The return value of crossval is a ResampleResult, which contains the models fitted in the individual iterations in the $models member (note that this is a list). The models are the objects returned by the underlying learner, so in each model there should be a member $D$ that contains the distance matrix.

See the tutorial for more information.

Edit: In this particular case, you don't get the learner models in the usual place because kknn is a (model-less) clusterer and the kknn function doesn't actually get called by mlr until you predict. The "model" returned by train is just the training data (with a few additional bits).

The predict function returns just the predictions and not the model, so unfortunately in this particular case you can't get to the distance matrices directly. However, you can get the learner model from mlr and call kknn on that to get the distance matrices:

 kknn(getTaskFormula(cv$models[[1]]$task.desc),
  train = cv$models[[1]]$learner.model$data,
  test = iris)$D
Lars Kotthoff
  • 107,425
  • 16
  • 204
  • 204
  • Thank you, Lars. I already checked `str(cv$models[1])`, but I don't see a member `$D` in the list. Am I doing something wrong? – JimBoy Aug 06 '17 at 17:50
  • Can you post your complete code + data that allows to reproduce the problem please? – Lars Kotthoff Aug 06 '17 at 17:54
  • Thank you very much, Lars! Though, it's a bit unfortunate that you can't directly access the distance matrix. I will see what I can do about it. – JimBoy Aug 06 '17 at 18:39
  • The underlying issue is that the distance matrix only becomes available when the test data is given. So there's no way to get it in the train function, and for predict mlr doesn't have a concept of more than just predictions being returned. You can have a look at the source code though (it's only a few lines for predict) and adapt accordingly for a special function you can use in your case. – Lars Kotthoff Aug 06 '17 at 18:43