I am trying to fine-tune BERT model for NER tagging task using tensorflow official nlp toolkit. I found there's already a bert token classifier class which i wanted to use. Looking at the code inside, I don't see any masking to prevent tag prediction and loss calculation for paddings and [SEP] token. I think the prevention is possible, just I don't know how? I wanted to prevent this for faster training and also one of the blog mentioned some weird behaviour when not masked.
Anybody has any idea about this?