1

I would like to build a CPO for the mlr::makeClassificationViaRegression wrapper. The wrapper builds regression models that predict for the positive class whether a particular example belongs to it (1) or not (-1). It also calculates predicted probabilities using a softmax.

After reading the documentation and vignettes for makeCPOTargetOp, my attempt is as follows:

cpoClassifViaRegr = makeCPOTargetOp(
  cpo.name = 'ClassifViaRegr', 
  dataformat = 'task', #Not sure - will this work if input is df with unknown target values?
  # properties.data = c('numerics', 'factors', 'ordered', 'missings'), #Is this needed?
  properties.adding = 'twoclass', #See https://mlrcpo.mlr-org.com/articles/a_4_custom_CPOs.html#task-type-and-conversion
  properties.needed = character(0),
  properties.target = c('classif', 'twoclass'), 
  task.type.out = 'regr',
  predict.type.map = c(response = 'response', prob = 'response'), 
  constant.invert = TRUE, 
  cpo.train = function(data, target) {
    getTaskDesc(data)
  }, 
  cpo.retrafo = function(data, target, control) {
    cat(class(target))
    td = getTaskData(target, target.extra = T)
    target.name = paste0(control$positive, ".prob")

    data = td$data
    data[[target.name]] = ifelse(td$target == pos, 1, -1)

    makeRegrTask(id = paste0(getTaskId(target), control$positive, '.'), 
                 data = data,
                 target = target.name,
                 weights = target$weights,
                 blocking = target$blocking)

  }, 
  cpo.train.invert = NULL, #Since constant.invert = T
  cpo.invert = function(target, control.invert, predict.type) {


    if(predict.type == 'response') {

      factor(ifelse(target > 0, control.invert$positive, control.invert$positive))

    } else {

      levs = c(control.invert$positive, control.invert$negative)
      propVectorToMatrix(vnapply(target, function(x) exp(x) / sum(exp(x))), levs)

    }

  })

It seems to work as expected, the demo below shows that the inverted prediction is identical to the prediction obtained using the makeClassificationViaRegr wrapper:

lrn = makeLearner("regr.lm")

# Wrapper -----------------------------------------------------------------

lrn2 = makeClassificationViaRegressionWrapper(lrn)
model = train(lrn2, sonar.task, subset = 1:140)
predictions = predict(model, newdata = getTaskData(sonar.task)[141:208, 1:60])


# CPO ---------------------------------------------------------------------

sonar.train = subsetTask(sonar.task, 1:140)
sonar.test = subsetTask(sonar.task, 141:208)

trafd = sonar.train %>>% cpoClassifViaRegr()
mod = train(lrn, trafd)
retr = sonar.test %>>% retrafo(trafd)
pred = predict(mod, retr)
invpred = invert(inverter(retr), pred)

identical(predictions$data$response, invpred$data$response)

The problem is that the after the CPO has converted the task from twoclass to regr, there is no way for me to specify predict.type = 'prob'. In the case of the wrapper, the properties of the base regr learner are modified to accept predict.type = prob (see here). But the CPO is unable to modify the learner in this way, so how can I tell my model to return predicted probabilities instead of the predicted response?

I was thinking I could specify a include.prob parameter, i.e. cpoClassifViaRegr(include.prob = T). If set to TRUE, the cpo.invert returns the predicted probabilities in addition to the predicted response. Would something like this work?

user51462
  • 1,658
  • 2
  • 13
  • 41

0 Answers0