I created a word sentiment app using the Naive Bayes algorithm.
There are two types of criteria in this classification training data, that is positive training data and negative training data. I take a unique word on every training data that has been grouped. so, I have all the unique words for each data criteria. Then, I calculate the probability value of occurrence of each unique word.
The problem is when I use uneven training data. For example: I use 60% of negative training data and 40% positive training data. Then the results of test data will be more likely to negative results, and vice versa.
Besides I have to use balanced data, what should I do to solve this problem? and is there an additional method I should add?