I'm using BERT-BILSTM-CRF model for sequence labeling. What I want now is to distill the heavy model to a much smaller one like lstm-crf. After surveying relevant papers, I found almost all the solution are based on softmax output instead of crf output. Is there is a solution for crf-output distillation?
Asked
Active
Viewed 314 times