2

This question has been seen 74 times and has received only one response (as of noon (PDT) Wed, Aug-14).

I've rewritten the question to make it as clear as possible and I'll appreciate any help.

As a summary, I need a small but complete example on a dataset with binary response on how to use MLR's makeCostSensWeightedPairsWrapper to obtain prediction probabilities on a test set.

In the MLR tutorial in the part on cost-sensitive classification https://mlr.mlr-org.com/articles/tutorial/cost_sensitive_classif.html there is a paragraph on "Example-dependent misclassification costs" and an example is given based on the iris dataset.

In the code snippet below, I modified the iris data set so as to contain only two classes as I'm interested in binary classification only.

library( mlr )
set.seed( 12347 )
n1 = 100; ntrain = 70
df = iris[ 1:n1, ]  # 100 points in df so as to have two classes only (setosa and versicolor)
df$Species = factor( df$Species )  # refactor the response

# partition df into a training set (70 points) and test set (30 points)
# 
ix = sample( 1:n1, ntrain, replace=FALSE )
xtest = df[ setdiff( 1:n1, ix ), ]  ## test set
ntest = nrow( xtest )
xtrain = df[ ix, ]   # this is the training set

# create cost matrix, same as in the MLR example
#
cost = matrix(runif(ntrain * 2, 0, 2000), ntrain) * (1 - diag(2))[xtrain$Species,] + runif(ntrain, 0, 10)
colnames(cost) = levels(xtrain$Species)
rownames(cost) = rownames(xtrain)

xtrain$Species = NULL   # this is done according to the MLR example

# cost-sensitive task
#
costsens.task = makeCostSensTask(id = "xtrain", data = xtrain, cost = cost )
costsens.task

##lrn = makeLearner("classif.multinom", trace = FALSE, predict.type="prob" )
lrn = makeLearner( "classif.gbm", predict.type="prob" )
lrn = makeCostSensWeightedPairsWrapper( lrn ); lrn

mod = train(lrn, costsens.task ); mod

pred = predict( mod, newdata = xtest, pred.type="prob" );

perf = performance( pred, measures = list(auc), task = costsens.task)

# I get the following error:
#   Error in FUN(X[[i]], ...) : 
#   You need to have a 'truth' column in your pred object for measure auc!

My original project is to do a binary classification which incorporates example-dependent misclassification costs.

The goal is to do a prediction on a test dataset, obtain the probabilities and show the performance (using ROCR, for which there are MLR-mapping functions).

NOTE: The learners I've tried are 'classif.multinom' and (I guessed) 'classif.gbm' as the two that might be compatible with the weighted pairs wrapper.

My questions are:

Q1: Where in the code snippet and how to specify that I want probabilities as an output of the cost-sensitive classifier?

Q2: Which learner can be used so as to produce classification probabilities?

Q3: How to avoid the error above and get the class probabilities?

Once again, I'd really appreciate any help, even more so if there is anyone who can answer promptly.

dnqxt
  • 81
  • 5
  • 1
    I don't know much about cost-sens applications and why you would remove the target column from the task (which then causes the error). I forwarded your questions internally. – pat-s Aug 12 '19 at 09:24
  • Many thanks for the response! Please use the current code snippet. – dnqxt Aug 12 '19 at 18:05

1 Answers1

0

OK, after a number of days and almost a hundred views of this question (about 20 are mine :) there has been only one comment and no answers.

From what I could understand exploring some of the available MLR documentation, it seems that the output of example-based cost-sensitive method (makeCostSensWeightedPairsWrapper) is labels only and no prediction probabilities.

In other words, no probabilities are available from a cost-sensitive task, only the new labels are given, which are in turn computed based on the probabilities of the base classifier.

So, this the answer which I'll accept.

As for MLR errors, in this case at least, it would be helpful to get an explicit error message instead of a spurious one, or simply to note this in the documentation.

dnqxt
  • 81
  • 5