4

I'm using BERT-BILSTM-CRF model for sequence labeling. What I want now is to distill the heavy model to a much smaller one like lstm-crf. After surveying relevant papers, I found almost all the solution are based on softmax output instead of crf output. Is there is a solution for crf-output distillation?

lanxu
  • 41
  • 1

0 Answers0