i am trying to train a masked language model from scratch. i use below code to create the Roberta model architecture. but when I compare it with RobertaLM, I found it does not have the GELU activation layer. could someone help explain how to correctly do this? thanks
config = RobertaConfig(
vocab_size= 50265,
max_position_embeddings=514,
num_attention_heads=12,
num_hidden_layers=12,
type_vocab_size=1,
)
model = RobertaForMaskedLM(config=config)