2

I am trying to fine-tune BERT model for NER tagging task using tensorflow official nlp toolkit. I found there's already a bert token classifier class which i wanted to use. Looking at the code inside, I don't see any masking to prevent tag prediction and loss calculation for paddings and [SEP] token. I think the prevention is possible, just I don't know how? I wanted to prevent this for faster training and also one of the blog mentioned some weird behaviour when not masked.

Anybody has any idea about this?

Mani Rai
  • 665
  • 2
  • 9
  • 22

1 Answers1

0

Have you found a solution? I'm doing the same task and I found the PADDING TOKEN is dominating the prediction. Passing in an attention mask didn't do anything so I manually chopped down the sequences to just 100 tokens long, and it improves.

Shane Feng
  • 11
  • 1
  • Take a look at the custom loss function in this blog: https://keras.io/examples/nlp/ner_transformers/#compile-and-fit-the-model – Mani Rai May 19 '22 at 17:54
  • I don't understand the code completely. But seems like its skipping losses of special tokens including paddings (using masking) while calculating average loss. – Mani Rai May 19 '22 at 18:02