I wanted to understand the logic behind working of SHAP by using custom functions and tokenizers. I tried running it and this part kept giving me errors:
method = "custom tokenizer"
# build an explainer by passing a transformers tokenizer
if method == "transformers tokenizer":
explainer = shap.Explainer(f, tokenizer, output_names=labels)
# build an explainer by explicitly creating a masker
elif method == "default masker":
masker = shap.maskers.Text(r"\W") # this will create a basic whitespace tokenizer
explainer = shap.Explainer(f, masker, output_names=labels)
# build a fully custom tokenizer
elif method == "custom tokenizer":
import re
def custom_tokenizer(s, return_offsets_mapping=True):
""" Custom tokenizers conform to a subset of the transformers API.
"""
pos = 0
offset_ranges = []
input_ids = []
for m in re.finditer(r"\W", s):
start, end = m.span(0)
offset_ranges.append((pos, start))
input_ids.append(s[pos:start])
pos = end
if pos != len(s):
offset_ranges.append((pos, len(s)))
input_ids.append(s[pos:])
out = {}
out["input_ids"] = input_ids
if return_offsets_mapping:
out["offset_mapping"] = offset_ranges
return out
masker = shap.maskers.Text(custom_tokenizer)
explainer = shap.Explainer(f, masker, output_names=labels)
And here's the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-9-1d4129d9e966> in <cell line: 4>()
34 return out
35
---> 36 masker = shap.maskers.Text(custom_tokenizer)
37 explainer = shap.Explainer(f, masker, output_names=labels)
1 frames
/usr/local/lib/python3.10/dist-packages/shap/utils/transformers.py in parse_prefix_suffix_for_tokenizer(tokenizer)
89 used to slice tokens belonging to sentence after passing through tokenizer.encode().
90 """
---> 91 null_tokens = tokenizer.encode("")
92 keep_prefix, keep_suffix, prefix_strlen, suffix_strlen = None, None, None, None
93
AttributeError: 'function' object has no attribute 'encode'
Firstly, how can i resolve this error? Secondly, I would like this code to be replicated for EmoRoBERTa, how can it be done?