Why does Lime need training data to compute local explanations

Question

I am using Lime to compute local explanation, however I do not understand why do I have to pass training data X_train in the below line of code

explainer = lime_tabular.LimeTabularExplainer(X_train, mode="regression", feature_names= boston.feature_names)

Below is an excerpt around how Lime operates from this great book named Interpretable Machine Learning by Christoph Molnar around XAI -

The recipe for training local surrogate models:

Select your instance of interest for which you want to have an explanation of its black box prediction.

Perturb your dataset and get the black box predictions for these new points.

Weight the new samples according to their proximity to the instance of interest.

Train a weighted, interpretable model on the dataset with the variations.

Explain the prediction by interpreting the local model.

If I understand correctly, Lime trains a weighted interpretable model for each instance of interest by sampling points from its neighbourhood. The weights assigned to the features in this model serve as the local explanation for that particular instance. And this is exactly what we do in the below lines of code -

exp = explainer.explain_instance(X_test.values[3], model.predict, num_features=6)

We pass an instance, using this instance it would compute the neighbours for which it would fit an interpretable model. So why did we pass X_train in the first line of code? How would Lime make use of it is what I don't understand.

In order to explain a data point you need some data around it to train an ML model. Thus you need to provide X_train. `lime` decides how many points from it to use for training the model and weights them according to the proximity to the data point to be explained. — Sergey Bushmanov, May 31 '23 at 15:21

Why does Lime need training data to compute local explanations

0 Answers0