0

I have dataset of 250000 points which has 15 features. Each feature takes values from 0 to 100.

So, I want to fit a probability distribution to this dataset to identify outliers like wrong data entry.

For univariate there is fitdist in R, what about multi variate?

How to do this effectively in R or Python?

curio17
  • 660
  • 1
  • 6
  • 15
  • Could you be more specific? With `R`, you could compute both the mean (`mean()`) and the standard deviation (`sd()`) of your features, and then check if your data entries lie in between a specific value range (like mean+/- 1.5 * standard deviation) and further examine the identified outliers. – LAP May 30 '17 at 07:51
  • Most readers will not be familiar with lakh. Consider rephrasing if you want to communicate well. – Glen_b May 30 '17 at 09:22

0 Answers0