Unbalanced sample causes issues and more efforts as we know.
When I am handling the issue, I am confused about the definition. Say, I have a training dataset of 200 cats, 200 dogs and 400 stones. When I am to classify the dataset, when classfying 3 classesm I should have 200 cats, 200 dogs and 200 stones, what should I allocate when I am just to classify 2 classes of pets and stones?
Should I still go with 400 pets
(w/ 200 cats & 200 dogs) and 400 stones
? make class pets and stones has same quantities.
or should I go with 400 pets
(w/ 200 cats & 200 dogs) and 200 stones
? or make all inner classes have the same probability to be watched, after all, cats and dogs are essentitally different.