2

Just discovered the Lime package in R and still trying to fully understand the package. I'm stumped though the visualization using 'plot_features'

Please excuse my naivety.

My question is this, is the case number for each row sequential? In other words, is case 416 equivalent to row 416 in the data? If not, how do I know the row each case number is referring to? Plot of feature weights

Sample code to reproduce the image above:

library(MASS)
library(lime)
data(biopsy)
biopsy$ID <- NULL
biopsy <- na.omit(biopsy)
biopsy2 = data.frame(ID = 1:nrow(biopsy), biopsy)
names(biopsy2) <- c('ID','clump thickness', 'uniformity of cell size', 
                   'uniformity of cell shape', 'marginal adhesion',
                   'single epithelial cell size', 'bare nuclei', 
                   'bland chromatin', 'normal nucleoli', 'mitoses',
                   'class')
# Now we'll fit a linear discriminant model on all but 4 cases
set.seed(4)
test_set <- sample(seq_len(nrow(biopsy2)), 4)
prediction <- biopsy2$class
biopsy2$class <- NULL
model <- lda(biopsy2[-test_set, ], prediction[-test_set])
predict(model, biopsy2[test_set, ])
explainer <- lime(biopsy2[-test_set,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(biopsy2[test_set, ], explainer, n_labels = 1, n_features = 4)
plot_features(explanation, ncol = 1)

EDIT: Added an extra column to the biopsy table called ID

OTStats
  • 1,820
  • 1
  • 13
  • 22
Mikee
  • 783
  • 1
  • 6
  • 18

1 Answers1

0

As you can see in explanation, in the plot we go case by case starting from the beginning:

head(explanation[, 1:5])
      model_type case  label label_prob  model_r2
1 classification  416 benign  0.9943635 0.5432439
2 classification  416 benign  0.9943635 0.5432439
3 classification  416 benign  0.9943635 0.5432439
4 classification  416 benign  0.9943635 0.5432439
5 classification    7 benign  0.9527375 0.6586789
6 classification    7 benign  0.9527375 0.6586789

However, since each case has multiple lines, it may be not a bad idea to know which lines to correspond do them. For that you may use

which(416 == explanation$case)
# [1] 1 2 3 4

so that

explanation[which(416 == explanation$case), 1:5]
#       model_type case  label label_prob model_r2
# 1 classification  416 benign  0.9949716 0.551287
# 2 classification  416 benign  0.9949716 0.551287
# 3 classification  416 benign  0.9949716 0.551287
# 4 classification  416 benign  0.9949716 0.551287
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • thanks. My objective is for me to be able to link the explanation table to the biopsy data frame via a common key/ID. Not sure your solution will give me that functionality. I'm aware that will make the key/ID have multiple rows but I don't mind. – Mikee Jan 30 '19 at 13:54
  • @user1783739, I'm not sure where the difficulty is. `which(416 == explanation$case)` gives rows that correspond do case 416, so that `case` can be seen as this key/ID. I updated my answer a little. – Julius Vainora Jan 30 '19 at 14:08
  • @user1783739, of course you may also use `explanation[416 == explanation$case, 1:5]`, a more concise version. – Julius Vainora Jan 31 '19 at 11:37
  • thank you - leveraging on your answer and with some tinkering, I was able to merge the explanation output with my initial data frame. – Mikee Feb 01 '19 at 09:39