This question has been seen 74 times and has received only one response (as of noon (PDT) Wed, Aug-14).
I've rewritten the question to make it as clear as possible and I'll appreciate any help.
As a summary, I need a small but complete example on a dataset with binary response on how to use MLR's makeCostSensWeightedPairsWrapper to obtain prediction probabilities on a test set.
In the MLR tutorial in the part on cost-sensitive classification https://mlr.mlr-org.com/articles/tutorial/cost_sensitive_classif.html there is a paragraph on "Example-dependent misclassification costs" and an example is given based on the iris dataset.
In the code snippet below, I modified the iris data set so as to contain only two classes as I'm interested in binary classification only.
library( mlr )
set.seed( 12347 )
n1 = 100; ntrain = 70
df = iris[ 1:n1, ] # 100 points in df so as to have two classes only (setosa and versicolor)
df$Species = factor( df$Species ) # refactor the response
# partition df into a training set (70 points) and test set (30 points)
#
ix = sample( 1:n1, ntrain, replace=FALSE )
xtest = df[ setdiff( 1:n1, ix ), ] ## test set
ntest = nrow( xtest )
xtrain = df[ ix, ] # this is the training set
# create cost matrix, same as in the MLR example
#
cost = matrix(runif(ntrain * 2, 0, 2000), ntrain) * (1 - diag(2))[xtrain$Species,] + runif(ntrain, 0, 10)
colnames(cost) = levels(xtrain$Species)
rownames(cost) = rownames(xtrain)
xtrain$Species = NULL # this is done according to the MLR example
# cost-sensitive task
#
costsens.task = makeCostSensTask(id = "xtrain", data = xtrain, cost = cost )
costsens.task
##lrn = makeLearner("classif.multinom", trace = FALSE, predict.type="prob" )
lrn = makeLearner( "classif.gbm", predict.type="prob" )
lrn = makeCostSensWeightedPairsWrapper( lrn ); lrn
mod = train(lrn, costsens.task ); mod
pred = predict( mod, newdata = xtest, pred.type="prob" );
perf = performance( pred, measures = list(auc), task = costsens.task)
# I get the following error:
# Error in FUN(X[[i]], ...) :
# You need to have a 'truth' column in your pred object for measure auc!
My original project is to do a binary classification which incorporates example-dependent misclassification costs.
The goal is to do a prediction on a test dataset, obtain the probabilities and show the performance (using ROCR, for which there are MLR-mapping functions).
NOTE: The learners I've tried are 'classif.multinom' and (I guessed) 'classif.gbm' as the two that might be compatible with the weighted pairs wrapper.
My questions are:
Q1: Where in the code snippet and how to specify that I want probabilities as an output of the cost-sensitive classifier?
Q2: Which learner can be used so as to produce classification probabilities?
Q3: How to avoid the error above and get the class probabilities?
Once again, I'd really appreciate any help, even more so if there is anyone who can answer promptly.