I’m reading a paper in the area of engineering. They have a labelled dataset which is biased. There are many more instances labelled A than B. They want to train a classifier to predict the A or B label based on some inputs (states).
The authors say:
To artificially remedy this problem, random replicas of the B states are incorporated into the dataset to even out the lot.
I don’t know much on data analytics, but this doesn’t sound very valid to me. Is it?