in my NLP task I want to understand the 'rule' of my classifier. For that purpose, I build a LimeTExtExplainer.
c= make_pipeline(cv,naive_bayes)
explainer = LimeTextExplainer(class_names=class_names, random_state=42, bow=False)
exp = explainer.explain_instance(X_test[i], c.predict_proba, num_features=20,)
fig = exp.as_pyplot_figure()
The above code creats a nice list of 1grams, exactly as I wanted.
:
In a next step I want to do the same, but with bigrams. I changed the feature extractor to only calculate bigrams:
cv = CountVectorizer(strip_accents='ascii', analyzer='word',
token_pattern=u'(?ui)\\b\\w*[a-z]+\\w*\\b',
lowercase=True, stop_words='english',
ngram_range=(2,2), max_features=None)
The problem(s):
- I use the same code for the Limeexplainer as above. But now, the graph only shows 1grams as before, but I only calculated bigrams.
- As a side question, the horizontal axis of the graphs displays the absolute probability that the word accounts to the classification probability? For instance, the texts class X probabilty is 0.67, recognit accounts for ~ 0.009 and langugage for ~ 0.007 of the 0.67, right?
Thanks in advance!