3

suppose we have the following data set (length 24):

x <- c(30L, 49L, 105L, 115L, 118L, 148L, 178L, 185L, 196L, 210L, 236L, 236L,
278L, 287L, 329L, 362L, 366L, 399L, 430L, 434L, 451L, 451L, 477L, 488L, 508L,
531L, 533L, 542L)

If we calculate the five-number summary: Minimum is 30, Maximum: 542, Median: (287 + 329) /2 = 308...that was the easy part!

  • Q1 is the median of the subset [30, 49,105,....287], length 14 --> Q1 = [178 + 185]/2 = 181.5
  • Q3 " " " " [329,362,...,542] = [451 + 451] / 2 = 451

Now if we check that with function summary(dataset) ... we get:

Min.   1st Qu.  Median    Mean    3rd Qu.    Max. 
30.0   183.2    308.0     309.7   451.0      542.0

Why do we get a different Q1? How does the function summary calculate Q1?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
RAKY
  • 51
  • 2

1 Answers1

4

There are (at least) nine ways to compute quantiles: see ?quantile. For this data set the 9 methods lead to 6 unique results: 2 out of 9 give your answer of 181.5 ...

res <- sapply(1:9, function(t) quantile(x, 0.25, type=t))
names(res) <- 1:9
sort(res)

##       1        3        4        6        8        9        2        5 
## 178.0000 178.0000 178.0000 179.7500 180.9167 181.0625 181.5000 181.5000 
##        7 
## 183.2500 

The default method in R is "type 7", which gives 183.25 (the value in summary is printed to slightly less precision, so appears as 183.2).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453