I am doing a research on pre-trained LMs, specifically the following LMs:
- BERT
- ALBERT
- RoBERTa
- XLNet
- DistilBERT
- BigBird
- ConvBERT
I am looking for information to compare these LMs like: number of parameters, layers, data on which they were pre-trained...
In other words, I want to extend the following table to the other LMs:
But I can't find information online! Can you please help me?
Thanks.