I have a corpus with 3 classes and am attempting to interpret which features can be considered as indicative of a class.
I went about it in a one-vs-rest way with SVM and performed binary classification, like class 1 versus 2+3, then 2 versus 1+3, and finally 3 versus 1+2. And then, for each classifier, I got the feature importances according to the coef_ variable. I attempted to use an out-of-the-box explainer like SHAP, but it is much too slow for the size of my data.
Could you please me if that is a reasonable approach? Although I could affirm something like "features A, B and C are responsible for discriminating the differences between classes 1 and 2", I still woudln't be able to say "a high feature A is indicative of class B"...
I also thought of recursively removing features one by one and re-training the classifier to see if it has any impact on precision and recall for individual classes, the problem is I have 400 features...
How would you go about it?
Thanks in advance!