How to get most significant tokens for each label in Fasttext supervised classification model?

Question

I've trained a Fasttext model using .train_supervised() and can't get my head around how to get the most important words for each label according to the model.

I have three labels so I would expect to be able to do something like

model.label["__label__1"].get_most_significant()

Any suggestions on how to go about achieving this?

This is something even I am looking for.. I am trying on my own.. Let us see if some experts have done something — Karthik Sunil, Jul 30 '20 at 14:37
Maybe https://github.com/marcotcr/lime can provide something _similar_ to what you ask, that can serve your use case (?) — Davide Fiocco, Mar 01 '21 at 11:24

score 1 · Answer 1 · answered Aug 03 '20 at 03:42

I've not noticed any such feature in the original FastText code, so wouldn't expect it in the Python wrapper, either.

You might be able to get something vaguely like what you want by the process:

for every individual word, do a predict-with-probabilities of the top k labels for that one-word text – with k possibly as large as the count of all labels
from each such label prediction, add that word, with the label probability, into a log for that label
sort that log for each label to put the word that gave the highest probability first; take the top n results as the words most indicative of that label

Hi Gojomo, first of all thanks for answering :). I had already started in the similar lines. I will update once I complete that.. (thumbs up) — Karthik Sunil, Aug 06 '20 at 15:32

How to get most significant tokens for each label in Fasttext supervised classification model?

1 Answers1