0

I want to determine outliers in a data frame using quantiles and 1.5*IQR. I have used the boxplot function and compared the resulting outliers to the ones computed using quantiles and iqr.

I am noticing a difference between the two methods. The boxplot method detects less outliers than the Q1-1.5*IQR, Q3+1.5*IQR computation. I have tried setting the range in boxplot to 1.5 but it still detects fewer outliers. Is range the correct boxplot option to set or is there another option that i need to set?

Any help is greatly appreciated.

x <- c(-8.4849, -8.4848, -8.8485, -8.4848, -8.4848, -8.4848, -8.7879, -8.4848,
       -8.4849, -8.6061, -8.3838, -8.2424, -8.4849, -8.3636, -8.2424, -8.7273)
qnt = quantile(x, probs=c(.25, .75))
iqt = 1.5 * IQR(x)
x[x < (qnt[1] - iqt)]
[1] -8.8485 -8.7879 -8.6061 -8.7273
x[x > (qnt[2] + iqt)]
[1] -8.2424 -8.3636 -8.2424

boxplot(x, range = 1.5)$out
[1] -8.8485 -8.7879 -8.2424 -8.2424 -8.7273
alaj
  • 187
  • 1
  • 10
  • See the definition of "hinges" on the `?boxplot.stats` help page. The values from `boxplot` aren't +/- the quantiles, they are +/- the hinges. – MrFlick Aug 03 '16 at 18:02

1 Answers1

2

Both quantile() and IQR() functions in R have a "type" argument. There are 9(!) types of quantiles. Most of them add some smooth behavior to this originally discontinuous function. You can read the full definitions of the types in the quantile() documentation.

The exact definition of quantile used in boxplot() can be found in the boxplot.stats() documentation and it's close to the type 2 quantile.

So, the answer is that there is no option to make boxplot() behave like quantile(), but there is an option to make quantile() behave (almost) like boxplot()

dmitrienka
  • 56
  • 4