I want to pre-train Bert from scratch using my own domain data. Here are my basic info about my data:
- millions of texts and around 60,000 vocab size;
- every single text is a list of number, like "2233 3454 3679";
- the longest text in my data is less than 15, most of them are 3 or 4;
- the test meaning is AS-Path in BGP updates or announcements.
So now I'm wondering if there is any tricks to train Bert on this type of data to get a good performance?
I want to know some tricks about training Bert on my domain data.