6

I have a dataset of around 10 integer features and I wish to remove outliers from my dataset, from each feature. What I have done in the past, is compute average and standard deviation for each feature and do a pass on the dataset, with discarding rows that qualify as outliers. Doing it on each column/ feature, helps me get rid of rows having at least one outlier feature.

Since parsing the dataset multiple times is not the optimal way, I was looking for ways to do this in a computation efficient manner. Can someone propose a better way so that the dataset can be parsed once and one can get rid of all outlier rows?

disha
  • 61
  • 4

0 Answers0