1

I am using the quantreg package to predict quantiles and their confidence intervals. I can't understand why the predicted quantiles are different from the quantiles calculated directly from the data using quantile().

library(tidyverse)
library(quantreg)

data <- tibble(data=runif(10)*10)
qr1 <- rq(formula=data ~ 1, tau=0.9, data=data) #  quantile regression
yqr1<- predict(qr1, newdata=tibble(data=c(1)), interval='confidence', level=0.95, se='boot') # predict quantile
q90 <- quantile(data$data, 0.9) # quantile of sample

> yqr1
       fit    lower   higher
1 6.999223 3.815588 10.18286
> q90
     90% 
7.270891
Simon Woodward
  • 1,946
  • 1
  • 16
  • 24

1 Answers1

1

You should realize the predicting the 90th percentile for a dataset with only 10 items is really based solely on the two highest values. You should review the help page for quantile where you will find multiple definitions of the term.

When I run this, I see:

 yqr1<- predict(qr1, newdata=tibble(data=c(1)) ) 
 yqr1
       1 
8.525812 

And when I look at the data I see:

data
# A tibble: 10 x 1
         data
        <dbl>
 1 8.52581158
 2 7.73959380
 3 4.53000680
 4 0.03431813
 5 2.13842058
 6 5.60713159
 7 6.17525537
 8 8.76262959
 9 5.30750304
10 4.61817190

So the rq function is estimating the second highest value as the 90th percentile, which seems perfectly reasonable. The quantile result is not actually estimated that way:

quantile(data$data, .9)
#     90% 
#8.549493 
?quantile
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Excellent. I forgot that `quantile()` has the `type` argument. When I set `type=1` I get the same answer from both methods, which is sufficient for my purposes. – Simon Woodward Oct 26 '17 at 00:58