R: LIME for variable importance with KERAS takes forever to run

Question

I am running a KERAS neural network in R - which works!

The network that I have estimated has 1.441 parameters.

22 Input variables and 11580 observations in the training set and 19.659 in the test set.

Now I trying to investigate what variables are important for the network. To do this I am trying with the LIME package.


library(lime)

keras.engine.sequential.Sequential <- function(x,...) {
  "regression"
}

predict_model.keras.engine.sequential.Sequential <- function (x, newdata, type, ...) {
  pred <- predict(object = x, x = as.matrix(newdata))
  data.frame (Positive = pred, Negative = 1 - pred) }

predict_model (x       = model_nn, 
               newdata = (q), 
               type    = 'raw')



explainer <- lime::lime(
  x= x,
  model = model_nn, 
  bin_continous = FALSE
)



explanation <- explain (
    q, # Just to show first 10 cases
    explainer    = explainer, 
   # n_labels     = 1, # explaining a `single class`(Polarity)
    n_features   = 2, # returns top four features critical to each case
    kernel_width = 0.5) # allows us to increase model_r2 value by shrinking the localized evaluation.

Where q is my testing set, defined as a data.frame, and x is that training test data set as a data.frame.

This takes a really long time to run, so long that I have to stop it..

Is this real? Or did I make a mistake in the code?

Lime is definitely slow. You could make a model with a really small dataset to make sure your code works, but I don't see anything that sticks out. — Kat, Apr 25 '22 at 18:30
Should the really small dataset be used to train the model, or just for using in the lime explanation part? — oxguru, Apr 26 '22 at 14:17
I tried with smaller dataset. It now gives the error: ``` Error in `$<-.data.frame`(`*tmp*`, "prediction", value = c(Positive = 0.0105328923091292, : replacement has 3608 row, data has 1804 ``` — oxguru, Apr 26 '22 at 14:39
Can you give me a reproducible example? I don't think it will do you a lot of good if I make up my own data. You could use something like `dput(head(dataobject, 200))`. I only go that high (200 rows) because you have so many variables. Dimensionality and neural networks... they can be fickle! Alternatively, you could add `dput(modelObject)` to your question. That might be easier; I don't have all the details on how you scaled and split your data. — Kat, Apr 26 '22 at 16:02

R: LIME for variable importance with KERAS takes forever to run

0 Answers0