I am using LDA for topic modelling but unfortunately my data is heavily skewed. I have documents from 10 different categories and would like each category to equally contribute to the LDA topics.
However, each category has a varying number of documents (one category for example holds more than 50% of the entire documents, while several categories hold only 1-2% of the documents).
What would be the best approach to assign weights to these categories, so they equally contribute to my topics? If I run the LDA without doing so, my topics will be largely based on the category, which holds over 50% of the documents in the corpus. I am exploring up-sampling but would prefer a solution that directly assigns weight in LDA.