I am having a hard time understanding the process of building a bag-of-words. This will be a multiclass classfication supervised machine learning problem wherein a webpage or a piece of text is assigned to one category from multiple pre-defined categories. Now the method that I am familiar with when building a bag of words for a specific category (for example, 'Math') is to collect a lot of webpages that are related to Math. From there, I would perform some data processing (such as remove stop words and performing TF-IDF) to obtain the bag-of-words for the category 'Math'.
Question: Another method that I am thinking of is to instead search in google for something like 'List of terms related to Math' to build my bag-of-words. I would like to ask if this is method is okay?
Another question: In the context of this question, does bag-of-words and corpus mean the same thing?
Thank you in advance!