I am getting a somewhat inscrutable error every time I true to run the LIME
text explainer on my BERT
regression model. Basically, the BERT
model produces a numerical prediction just fine for any text I supply it with, but when the LIME
text explainer uses the model it causes it to generate this error:
ValueError: only one element tensors can be converted to Python scalars
I import my libraries and model as follows:
import lime
import torch
from lime.lime_text import LimeTextExplainer
explainer = LimeTextExplainer()
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Load the saved BERT model and tokenizer
loaded_model = AutoModelForSequenceClassification.from_pretrained("/path/to/model")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
I then define a prediction
function for my BERT
model:
def predict(text):
inputs = tokenizer(text, padding=True, truncation=True, return_tensors="pt", max_length=128)
loaded_model.eval()
with torch.no_grad():
outputs = loaded_model(**inputs)
predicted_value = outputs.logits.item()
return predicted_value
I test it and it works fine:
text_to_interpret = "We're flying high, watching the world pass us by."
predict(text_to_interpret)
>0.01548099610954523
However, when I instantiate a LIME
text explainer, I get the error below:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/var/folders/zp/g2cg0s7d3vn0y092tw5x90rc0000gn/T/ipykernel_43525/1810001145.py in <module>
----> 1 explanation = explainer.explain_instance(text_to_interpret, predict)
~/opt/anaconda3/lib/python3.9/site-packages/lime/lime_text.py in explain_instance(self, text_instance, classifier_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
411 mask_string=self.mask_string))
412 domain_mapper = TextDomainMapper(indexed_string)
--> 413 data, yss, distances = self.__data_labels_distances(
414 indexed_string, classifier_fn, num_samples,
415 distance_metric=distance_metric)
~/opt/anaconda3/lib/python3.9/site-packages/lime/lime_text.py in __data_labels_distances(self, indexed_string, classifier_fn, num_samples, distance_metric)
480 data[i, inactive] = 0
481 inverse_data.append(indexed_string.inverse_removing(inactive))
--> 482 labels = classifier_fn(inverse_data)
483 distances = distance_fn(sp.sparse.csr_matrix(data))
484 return data, labels, distances
/var/folders/zp/g2cg0s7d3vn0y092tw5x90rc0000gn/T/ipykernel_43525/1417496559.py in predict(text)
4 with torch.no_grad():
5 outputs = loaded_model(**inputs)
----> 6 predicted_value = outputs.logits.item()
7 return predicted_value
8
ValueError: only one element tensors can be converted to Python scalars
I don't know whether this is an issue with LIME
or with my BERT
model. The model seems to work just fine on its own, so that makes me think it's LIME
. But then the error is clearly a Torch
tensor that can't be converted to a scalar.
I've tried wrapping the predict
function in a try except
, and converting dud outputs into np.nan
values, but that's giving me other errors.
Can anyone help with this?