0

I wanted to understand the logic behind working of SHAP by using custom functions and tokenizers. I tried running it and this part kept giving me errors:

method = "custom tokenizer"

# build an explainer by passing a transformers tokenizer
if method == "transformers tokenizer":
    explainer = shap.Explainer(f, tokenizer, output_names=labels)

# build an explainer by explicitly creating a masker
elif method == "default masker":
    masker = shap.maskers.Text(r"\W") # this will create a basic whitespace tokenizer
    explainer = shap.Explainer(f, masker, output_names=labels)

# build a fully custom tokenizer
elif method == "custom tokenizer":
    import re

    def custom_tokenizer(s, return_offsets_mapping=True):
        """ Custom tokenizers conform to a subset of the transformers API.
        """
        pos = 0
        offset_ranges = []
        input_ids = []
        for m in re.finditer(r"\W", s):
            start, end = m.span(0)
            offset_ranges.append((pos, start))
            input_ids.append(s[pos:start])
            pos = end
        if pos != len(s):
            offset_ranges.append((pos, len(s)))
            input_ids.append(s[pos:])
        out = {}
        out["input_ids"] = input_ids
        if return_offsets_mapping:
            out["offset_mapping"] = offset_ranges
        return out

    masker = shap.maskers.Text(custom_tokenizer)
    explainer = shap.Explainer(f, masker, output_names=labels)

And here's the error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-1d4129d9e966> in <cell line: 4>()
     34         return out
     35 
---> 36     masker = shap.maskers.Text(custom_tokenizer)
     37     explainer = shap.Explainer(f, masker, output_names=labels)

1 frames
/usr/local/lib/python3.10/dist-packages/shap/utils/transformers.py in parse_prefix_suffix_for_tokenizer(tokenizer)
     89     used to slice tokens belonging to sentence after passing through tokenizer.encode().
     90     """
---> 91     null_tokens = tokenizer.encode("")
     92     keep_prefix, keep_suffix, prefix_strlen, suffix_strlen = None, None, None, None
     93 

AttributeError: 'function' object has no attribute 'encode'

Firstly, how can i resolve this error? Secondly, I would like this code to be replicated for EmoRoBERTa, how can it be done?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
abdz_128
  • 31
  • 8

0 Answers0