I have a training set looks something like this.
features:categorical/numerical
output:binary 1/0
[1] feature[1][1] feature[1][2] ... feature[1][j]
[2] feature[2][1] feature[2][2] ... feature[2][j]
.
.
.
[i] feature[i][1] feature[i][2] ... feature[i][j]
Suppose some samples(row) have "good" value combinations that are likely to yield similar output, whereas others have "bad" value combinations thus difficult to predict.
My goal is, by getting rid of of those bad samples which lack regularity, I want to improve final accuracy. Can someone tell me what could be the best algorithm or preprocess to automatically detect those samples so that only the good samples are going to be trained? Thank you in advance!
ENV: MXNet, R