I need to remove some outliers from two variables in my dataset. What I've been thinking of is to replace those outliers with the value of it's Q3+-1.5IQR. Is there a fuction aviable to do this or how can I create a fuction that replaces the values of those observation that exceed Q3+1.5IQR for the value of the Q3+1.5IQR itself. Thank you in advance
Asked
Active
Viewed 50 times
0
-
7You should reconsider this. It looks extremely dubious from the perspective of a statistician. – Roland Jun 02 '20 at 12:46
-
Can you give any idea of what could I do, please? – Ricardo Xavier Torres Ortiz Jun 02 '20 at 12:48
-
1Technically, this is pretty easy to cobble together using `ifelse` and `quantile`, but @Roland has a very good point. Replacing a dataset by one more to your liking is arbitrary. – John Coleman Jun 02 '20 at 12:49
-
2Outliers are always related to a distribution model. Often, you should use a different distribution model (such as using a generalized linear model instead of an ordinary (Gaussian) linear model) instead of "handling outliers". More specific advice would need more information (why do you believe you have outliers, what do you plan to do with the dataset ...) – Roland Jun 02 '20 at 12:53
-
1See this (https://stackoverflow.com/questions/13339685/how-to-replace-outliers-with-the-5th-and-95th-percentile-values-in-r) post if you still want to do it. – Ahorn Jun 02 '20 at 12:55
-
It might be more defensible to replace outliers with `NA`, or to impute with a value more representative of the non-outlier data (e.g., mean, median, random ... all stratified). You can use `pmin` (with `quantile`) to update your values, but I strongly agree with Roland that your method sounds suspect and will have negative ramifications either on your process or on your results. – r2evans Jun 02 '20 at 14:28