How to mask [PAD] and [SEP] tokens to prevent their prediction and loss calculation for NER task on BERT models?

Question

I am trying to fine-tune BERT model for NER tagging task using tensorflow official nlp toolkit. I found there's already a bert token classifier class which i wanted to use. Looking at the code inside, I don't see any masking to prevent tag prediction and loss calculation for paddings and [SEP] token. I think the prevention is possible, just I don't know how? I wanted to prevent this for faster training and also one of the blog mentioned some weird behaviour when not masked.

Anybody has any idea about this?

score 0 · Answer 1 · answered May 19 '22 at 06:34

0

Have you found a solution? I'm doing the same task and I found the PADDING TOKEN is dominating the prediction. Passing in an attention mask didn't do anything so I manually chopped down the sequences to just 100 tokens long, and it improves.

answered May 19 '22 at 06:34

Shane Feng

11
1

Take a look at the custom loss function in this blog: https://keras.io/examples/nlp/ner_transformers/#compile-and-fit-the-model – Mani Rai May 19 '22 at 17:54
I don't understand the code completely. But seems like its skipping losses of special tokens including paddings (using masking) while calculating average loss. – Mani Rai May 19 '22 at 18:02

How to mask [PAD] and [SEP] tokens to prevent their prediction and loss calculation for NER task on BERT models?

1 Answers1